Azure · changliu2 · Nov 25, 2024 · Nov 25, 2024
@@ -0,0 +1,7 @@
+| 	| |
+| -- | -- |
+| Score range |	Integer [0-7]: where 0 is the least harmful and 7 is the most harmful. A text label inis also provided. |
+| What is this metric? | Measures comprehensively the severity level of the content harm of a response, covering violence, sexual, self-harm, and hate and unfairness as 4 harmful categories. |
+| How does it work? | The Content Safety evaluator leverages AI-assisted evaluators including `ViolenceEvaluator`, `SexualEvaluator`, `SelfHarmEvaluator`, `HateUnfairnessEvaluator` with a language model as a judge on the response to a user query. See the [definitions and severity scale](https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=severity#risk-and-safety-evaluators) for these AI-assisted evaluators. |
+| When to use it? |	Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications. |
+| What does it need as input? |	Query, Response |
@@ -0,0 +1,7 @@
+| 	| |
+| -- | -- |
+| Score range |	Float [0-1] for F1 score evaluator: the higher, the more similar is the response with ground truth. Integer [1-5] for AI-assisted quality evaluators for question-and-answering (QA) scenarios: where 1 is bad and 5 is good |
+| What is this metric? | Measures comprehensively the groundedness, coherence, and fluency of a response in QA scenarios, as well as the textual similarity between the response and its ground truth. |
+| How does it work? | The QA evaluator leverages prompt-based AI-assisted evaluators using a language model as a judge on the response to a user query, including `GroundednessEvaluator` (needs input `context`), `RelevanceEvaluator`, `CoherenceEvaluator`, `FluencyEvaluator`, and `SimilarityEvaluator` (needs input `ground_truth`). It also includes a Natural Language Process (NLP) metric `F1ScoreEvaluator` using F1 score on shared tokens between the response and its ground truth. See the [definitions and scoring rubrics](https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning#generation-quality-metrics) for these AI-assisted evaluators and F1 score evaluator. |
+| When to use it? |	Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications. |
+| What does it need as input? |	Query, Response, Context, Ground Truth |