{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "2OvkPji9O-qX"
   },
   "source": [
    "# Tutorial: Evaluating RAG Pipelines\n",
    "\n",
    "- **Level**: Intermediate\n",
    "- **Time to complete**: 15 minutes\n",
    "- **Components Used**: `InMemoryDocumentStore`, `InMemoryEmbeddingRetriever`, `ChatPromptBuilder`, `OpenAIChatGenerator`, `DocumentMRREvaluator`, `FaithfulnessEvaluator`, `SASEvaluator`\n",
    "- **Prerequisites**: You must have an API key from an active OpenAI account as this tutorial is using the gpt-4o-mini model by OpenAI: https://platform.openai.com/api-keys\n",
    "- **Goal**: After completing this tutorial, you'll have learned how to evaluate your RAG pipelines both with model-based, and statistical metrics available in the Haystack evaluation offering. You'll also see which other evaluation frameworks are integrated with Haystack."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LFqHcXYPO-qZ"
   },
   "source": [
    "## Overview\n",
    "\n",
    "In this tutorial, you will learn how to evaluate Haystack pipelines, in particular, Retriaval-Augmented Generation ([RAG](https://www.deepset.ai/blog/llms-retrieval-augmentation)) pipelines.\n",
    "1. You will first build a pipeline that answers medical questions based on PubMed data.\n",
    "2. You will build an evaluation pipeline that makes use of some metrics like Document MRR and Answer Faithfulness.\n",
    "3. You will run your RAG pipeline and evaluated the output with your evaluation pipeline.\n",
    "\n",
    "Haystack provides a wide range of [`Evaluators`](https://docs.haystack.deepset.ai/docs/evaluators) which can perform 2 types of evaluations:\n",
    "- [Model-Based evaluation](https://docs.haystack.deepset.ai/docs/model-based-evaluation)\n",
    "- [Statistical evaluation](https://docs.haystack.deepset.ai/docs/statistical-evaluation)\n",
    "\n",
    "We will use some of these evalution techniques in this tutorial to evaluate a RAG pipeline that is designed to answer questions on PubMed data.\n",
    "\n",
    ">🧑‍🍳 As well as Haystack's own evaluation metrics, you can also integrate with a number of evaluation frameworks. See the integrations and examples below 👇\n",
    "> - [Evaluate with DeepEval](https://haystack.deepset.ai/cookbook/rag_eval_deep_eval)\n",
    "> - [Evaluate with RAGAS](https://haystack.deepset.ai/cookbook/rag_eval_ragas)\n",
    "\n",
    "### Evaluating RAG Pipelines\n",
    "RAG pipelines ultimately consist of at least 2 steps:\n",
    "- Retrieval\n",
    "- Generation\n",
    "\n",
    "To evaluate a full RAG pipeline, we have to evaluate each of these steps in isolation, as well as a full unit. While retrieval can in some cases be evaluated with some statistical metrics that require labels, it's not a straight-forward task to do the same for the generation step. Instead, we often rely on model-based metrics to evaluate the generation step, where an LLM is used as the 'evaluator'.\n",
    "\n",
    "![Steps or RAG](https://raw.githubusercontent.com/deepset-ai/haystack-tutorials/main/tutorials/img/tutorial35_rag.png)\n",
    "\n",
    "#### 📺 Code Along\n",
    "\n",
    "<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/5PrzXaZ0-qk?si=lgBSfHatbV2i59J-\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen></iframe>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Kww5B_vXO-qZ"
   },
   "source": [
    "## Installing Haystack\n",
    "\n",
    "Install Haystack and [datasets](https://pypi.org/project/datasets/) with `pip`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "UQbU8GUfO-qZ",
    "outputId": "80fe52ea-108b-4bb4-cb1d-fe79373c86f3"
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "pip install haystack-ai\n",
    "pip install \"datasets>=2.6.1\"\n",
    "pip install \"sentence-transformers>=4.1.0\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "_lvfew16O-qa"
   },
   "source": [
    "## Create the RAG Pipeline to Evaluate\n",
    "\n",
    "To evaluate a RAG pipeline, we need a RAG pipeline to start with. So, we will start by creating a question answering pipeline.\n",
    "\n",
    "> 💡 For a complete tutorial on creating Retrieval-Augmmented Generation pipelines check out the [Creating Your First QA Pipeline with Retrieval-Augmentation Tutorial](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline)\n",
    "\n",
    "For this tutorial, we will be using [a labeled PubMed dataset](https://huggingface.co/datasets/vblagoje/PubMedQA_instruction/viewer/default/train?row=0) with questions, contexts and answers. This way, we can use the contexts as Documents, and we also have the required labeled data that we need for some of the evaluation metrics we will be using.\n",
    "\n",
    "First, let's fetch the prepared dataset and extract `all_documents`, `all_questions` and `all_ground_truth_answers`:\n",
    "\n",
    "> ℹ️ The dataset is quite large, we're using the first 1000 rows in this example, but you can increase this if you want to\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 177,
     "referenced_widgets": [
      "f4dcb30b5589434f9faa18eb9563c738",
      "aaeaf649a13e456ea5f1197bf0683215",
      "aa6c86293fca4e51a90afaf95adcc1b5",
      "f13f78f9ba514240b2824f6d493a18b1",
      "2cf0fa53349c4ebeb866ccc001ed55f2",
      "bd80af5c3f6c425f8eb17be695fc31b6",
      "6f09fb2fe0564015b7be3dddd7ba9c09",
      "bfe502d4375a4c6c86c11c13581ece86",
      "ebf73536fc434ce7828ba454d716972a",
      "44a62894ef8748c8a950e6eafe0c8c80",
      "55c57ef4204e4150b5db017fce037cc7",
      "f7f86d8d1e5e403fae730c43cd99ab1d",
      "85522c23863e481695a7519752ffded7",
      "f84873763e1e498886769c8fcaebaecb",
      "bca672d423a348b9b6c7b10eeae3bc5a",
      "ed57a82da84b40a4a46c2ebd13e72564",
      "cb0e76f52ae946c3857e24ec3876b9d9",
      "467bbb3f34a74c29927774b423b5b022",
      "269507a4b22349abaa1fe561792fed6a",
      "7eccdeb84b15449d88c65315fb8302cd",
      "3f82b5fabc51471392d05307a9b57fd3",
      "fe15c1b1cb064b3aa3dff9be99bb113f",
      "b6978ebb1d574a658ba65d8d5dfa4342",
      "08c6ddb11c304ba1891b057c3782a8fe",
      "0fad933052f942d186eaf78ff4b21eb6",
      "8690edf1be09457f87bce485f4415e27",
      "19888ae0471c48589d690402c2d4d187",
      "44c50557fb574ba3bd9a2831b430f0d2",
      "e5ad5510e1b64eacbeec675e4156cf5e",
      "5e98763346f54576997fd02cdddaa743",
      "06a8c2979b094580a3f5206817f8ec95",
      "d7612556acd6421bb535d56974e046ac",
      "f12ced9b05bd4629b05caa844c8d7b6a",
      "5889178c30ee4d36b67059f3b3f406a1",
      "da88c9c350d9499ba782d742ef409f72",
      "1c348c5727b54b36823b46c8f9f5d275",
      "3c5d73805c0347988d4ded3aac52bba0",
      "deff0f4b393e4a59bc3c806830d46047",
      "e0596b6e841a4d5daa05e63cde4413fa",
      "8c1c5620be1a473babe93f8607054e21",
      "08128381d8d14a28acb5f4a67a2d4d0a",
      "3e3dad1ef0d64d2eab743ee0554e1391",
      "9aa747d77bdd4d04a4103f57e76ed8ee",
      "748d8bcefe244afc8dbfc76c76e38110",
      "18b53d4a7a404ec2aca100308c4c8036",
      "0f90a778d5ab4047bd9fbbef7fc9fc4d",
      "a4dcc594b24c4ed090a0710eb3ef33d7",
      "1dc5ee24204a4d19b274d0813b66fc76",
      "deccd3bbd18e41fab9ee0e9d9654f8e7",
      "7e20a41f44d2446a802d7e7fb0cb1f5a",
      "0ce9912d0f434369b023dd45d5ffe466",
      "062fbac1212144f4b73d49411bf11a68",
      "eac531efe9294f45807fd1871556e14b",
      "ba55d493e75e4a51980269072fcd2a80",
      "6b00d1bcb9c948fab4585f8db999a082"
     ]
    },
    "id": "CbVN-s5LO-qa",
    "outputId": "199392b0-f51d-4148-a486-5e797c049d9f"
   },
   "outputs": [],
   "source": [
    "from datasets import load_dataset\n",
    "from haystack import Document\n",
    "\n",
    "dataset = load_dataset(\"vblagoje/PubMedQA_instruction\", split=\"train\")\n",
    "dataset = dataset.select(range(1000))\n",
    "all_documents = [Document(content=doc[\"context\"]) for doc in dataset]\n",
    "all_questions = [doc[\"instruction\"] for doc in dataset]\n",
    "all_ground_truth_answers = [doc[\"response\"] for doc in dataset]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "yL8nuJdWO-qa"
   },
   "source": [
    "Next, let's build a simple indexing pipeline and write the `documents` into a DocumentStore. Here, we're using the `InMemoryDocumentStore`.\n",
    "\n",
    "> `InMemoryDocumentStore` is the simplest DocumentStore to get started with. It requires no external dependencies and it's a good option for smaller projects and debugging. But it doesn't scale up so well to larger Document collections, so it's not a good choice for production systems. To learn more about the different types of external databases that Haystack supports, see [DocumentStore Integrations](https://haystack.deepset.ai/integrations?type=Document+Store)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 418,
     "referenced_widgets": [
      "c372271358e844fbacd24f184ffc821a",
      "24e0fe35e3d1472abe735912f337cb0c",
      "07310a57f3c746c894c6de631856b5af",
      "f828befbd603451e9e1ac2d749e8b382",
      "72adc44b42b24881a179cd853eda4c68",
      "b6d1c043de3d444b946c314e45578253",
      "5b329e3fee71489fb26ba686a3cfdaa8",
      "f6696a92d1ef4bb590f42fa06c368bcd",
      "afb5f22bf34a41e5bd2b12bf406d9bed",
      "87d438acb0b54262a05b08f72389f921",
      "2ffbc50205ca43e3a7f6ff25ccd39023",
      "b6726d348126452a88b5b70c444198b3",
      "ce6b615c8bfe4b8fae7726cdb92e5456",
      "8553ffa45af84d36a0d2cfd37b3aa4f8",
      "660a90f1abe04a39b5c3115054357a2c",
      "97380e0115ef4d64901985d42e28148f",
      "dee3dddd597148debb6e7315ff14c2fc",
      "96afa72c49c74f6393cac39d78a009f2",
      "b895c08dff334febbd1803fbe08b2bad",
      "16547c5439cd497297800eeaa204fb3b",
      "09b4f0d2ca4548b3a79708faa36247ce",
      "8198295080aa4a4580157a8fd91b9bdf",
      "7235dd65564a4af883bbfa28ab83f692",
      "4c3f34eeb88141c187f6c496dc73d7f2",
      "dcb4007d37bf4b44a13324eab2da2917",
      "09d9005a2ca0478faa9c3ea0e8b60320",
      "ebcc53ae2acc4fcfb5ff6ab7b0be65b7",
      "1535dc12e13a4dbea0b4d602e24c45d0",
      "d4e421647b124f709956f572fbd648e6",
      "2f7a53a0267741108dc74e191ebd11ce",
      "1e3dbf0e95d840ecb0cd96570db53477",
      "86ce2e80c3884c21858206ee50f635a8",
      "46d089baa10b49b5acd9db359d7c7a4a",
      "cd36e63787aa45879d7dbc5b3467e829",
      "baedf218445944c39859459b9599603c",
      "df7d969eeaeb4407a51bd500f2f89a2e",
      "ecf2aa010e454b918fe6bf27690e9db0",
      "838006db1c974f85acc23319b4c36363",
      "6de0c1bc24fb4420bf1931f46724cd1b",
      "2f487a06972e4e0891be3d20eda3fae8",
      "2195cd7f105a4af589026c67eb56845b",
      "8c13c7cc462347319d0d37268b8b6352",
      "9648f5984db74343934be31923aceac1",
      "7a828843a0f7421cbe93f630ae24952a",
      "18668f5ad4484fd6bbb2ba9a5b86325e",
      "9cc55e10477748f680da4fb401728ca1",
      "47a6e1c82c7846b38764009beecdcec4",
      "20a9b689a5394e0ebb842b151064a973",
      "8e670cd06c8a45a688410f979624ddd1",
      "5898c5904ef144caa8cb2534e81bb2de",
      "516be54e07204e7880ec84aafc879360",
      "e71ecc39157140a498da057789a89c72",
      "64d683fef7f14c9eb15d4e0fb3b1cc2b",
      "b7ef9bb5e19f453780035bebad8383e8",
      "58f7ab3443044e0daddcb96f261ad246",
      "bcca4a99540f4139917a03ba5d96ef47",
      "6abe5bd72e9b43778ac85555b6fc1a9d",
      "3a1716e5345e411fadd4cd2036bec942",
      "33ac75266d6844d6b177bf932d420546",
      "1172938ed8544f24bb750e2e9cfff245",
      "e8f8244893804eadbf00e780fb69cf51",
      "7554a4b24fa642e59aa673ac9504f50e",
      "914841c149fd464dae02508bb4596af9",
      "f451e00195e044dcbd9bee76980ac3b3",
      "e4d8040a736e4f5d93dc2fa849744238",
      "671da0695248442b8f8f91be852490a1",
      "7f709f5b00b946458f41ad705294c4d6",
      "8d34d02fdb764448a0a5fd7a958cd24e",
      "3b06182176974ced996758b08ac7d849",
      "fdd243da5f0141b583cc82aeca138b43",
      "9cd3fe9420bd4188960dd3c98dd94d2a",
      "7ded2d6c7a144c379d0fe851ab6655ee",
      "101c17397a654f5ea0b3a45a8317fc58",
      "dd47625723844d81bacc47cee1fd7999",
      "3022f9cfb2a94227881bc91915b19e57",
      "f7ad3932d52c4524b5f1ff2ea88ea118",
      "0a3668d20dad4842b142c41daaf6ced6",
      "d6d30ce04ed1492f90134c0c129631b4",
      "b3cbe2f5ceb64eae84cdba57180b5bd9",
      "f53a303831124a79b60f335f4690660a",
      "1e4f7ba44dd6460e8294d97ec9e9c921",
      "a28d155292ca42a3b9ed582751c6d8f2",
      "22eb3e09874646cca3083edbd4bed35b",
      "28a0084ec544441bb0539c936766a597",
      "da977a5fa8614559a23d0380a0d38b1b",
      "cc2f333289814c8d9eb8355e815b2916",
      "550e5121853540f39fd04d44c7252cfa",
      "f3d8efa9b06f414fa9519f10455c7847",
      "7a9cec9e5b124c0d82c7dcb87f2b0582",
      "d2ba6ee415684119a5be4cc7c04c32c0",
      "19ef824de98245e597f2c279fc8071c8",
      "031258ad21b24ecc8702bb367330e43a",
      "aa3fbba26e604b259e2d44fe8e488e1d",
      "8afcbf393b824b7d93b30edcdb428651",
      "bb0dd6fd4dfc4c7abcb4f4a233c759a2",
      "560236d7a4f74d4bb5d36c74bbf4c24e",
      "0bd15fc40ad14a098905447df0899415",
      "459a50cff4d144ed899862a1405c823f",
      "07ac0236b16747488a4c7a101514f756",
      "84518ee50615449d8bf617eab7e70ee2",
      "e8b059dc3b5642018e864171ad70ea9e",
      "24026956d46748a7b9708ed92817f0a4",
      "b6fc5f9913af460eb8e6dd702c53720f",
      "2230a0bc10364aa8b1d19a54e7f4409d",
      "16e706ff3a494d37a1ad9b46550d439d",
      "2f407b528f5b46239538914fbfeef9e7",
      "5d56a0819e764c65b6ff6f765bee170e",
      "2abd38eec36946ccb1eb203856b2f588",
      "af4c92a9e44f455e91afbb02dea7b1f3",
      "808fce33f4c741da9d7e57f2e5dfe198",
      "82b49fa6b2fb4ddba29f7b10feab6b2a",
      "6a7fb0f89d8841d9b716a7cb260483f4",
      "0c1eb77b68a84279bd156e27c2029450",
      "61a25c191929406d9382964a79eb1047",
      "8a490c587bcb4b1ebe003259052c4ed5",
      "b449fb21a6b74db69559de14231acc8c",
      "3fdcaeadd92d41ba9ea5ff2466fd94b1",
      "f78c318a2b3840a58579e7a920e6df57",
      "3cb9aa92e9864cd98554eb5b0aa49481",
      "8fca7108daaf455b803e42940f24a4b8",
      "ed6918c17648450fb8fe6217991a1c28",
      "54cba43df68b4b22bd77a259a2b730a1",
      "834c24bcd7e94cc1b1e4625b3d3ffe15",
      "7af10cc9d9e04c74a8ebf40b92695edf",
      "5153e235482a4579995de1aef9dd17b4",
      "3d3ac88bacc74f29b69ff2f1b513ab2c",
      "5df6e93529e3406bad33422a2cdcedfb",
      "c92d7ab1558644c09f4c67ec227c0c02",
      "ff758ccdad25463b8f4c21e6ea747f9d",
      "1cc88246a0cf477bbefe5246282db7b2",
      "105a0502e43547abb0f1c5931ac274db",
      "6fd0434539b94fd1a4ac02c70ce92682"
     ]
    },
    "id": "JfY_zgQ15dVq",
    "outputId": "f66883c5-3d09-4610-f9b3-0f5495799ad5"
   },
   "outputs": [],
   "source": [
    "from typing import List\n",
    "from haystack import Pipeline\n",
    "from haystack.components.embedders import SentenceTransformersDocumentEmbedder\n",
    "from haystack.components.writers import DocumentWriter\n",
    "from haystack.document_stores.in_memory import InMemoryDocumentStore\n",
    "from haystack.document_stores.types import DuplicatePolicy\n",
    "\n",
    "document_store = InMemoryDocumentStore()\n",
    "\n",
    "document_embedder = SentenceTransformersDocumentEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")\n",
    "document_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)\n",
    "\n",
    "indexing = Pipeline()\n",
    "indexing.add_component(instance=document_embedder, name=\"document_embedder\")\n",
    "indexing.add_component(instance=document_writer, name=\"document_writer\")\n",
    "\n",
    "indexing.connect(\"document_embedder.documents\", \"document_writer.documents\")\n",
    "\n",
    "indexing.run({\"document_embedder\": {\"documents\": all_documents}})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XvLVaFHTO-qb"
   },
   "source": [
    "Now that we have our data ready, we can create a simple RAG pipeline.\n",
    "\n",
    "In this example, we'll be using:\n",
    "- [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever) which will get the relevant documents to the query.\n",
    "- [`OpenAIChatGenerator`](https://docs.haystack.deepset.ai/docs/openaichatgenerator) to generate answers to queries. You can replace `OpenAIChatGenerator` in your pipeline with another `ChatGenerator`. Check out the full list of generators [here](https://docs.haystack.deepset.ai/docs/generators)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "-uo-6fjiO-qb",
    "outputId": "12f00bd6-fc05-40fd-db96-429658039c32"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "from haystack.components.builders import AnswerBuilder, ChatPromptBuilder\n",
    "from haystack.dataclasses import ChatMessage\n",
    "from haystack.components.embedders import SentenceTransformersTextEmbedder\n",
    "from haystack.components.generators.chat import OpenAIChatGenerator\n",
    "from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n",
    "\n",
    "template = [\n",
    "    ChatMessage.from_user(\n",
    "        \"\"\"\n",
    "        You have to answer the following question based on the given context information only.\n",
    "\n",
    "        Context:\n",
    "        {% for document in documents %}\n",
    "            {{ document.content }}\n",
    "        {% endfor %}\n",
    "\n",
    "        Question: {{question}}\n",
    "        Answer:\n",
    "        \"\"\"\n",
    "    )\n",
    "]\n",
    "\n",
    "rag_pipeline = Pipeline()\n",
    "rag_pipeline.add_component(\n",
    "    \"query_embedder\", SentenceTransformersTextEmbedder(model=\"sentence-transformers/all-MiniLM-L6-v2\")\n",
    ")\n",
    "rag_pipeline.add_component(\"retriever\", InMemoryEmbeddingRetriever(document_store, top_k=3))\n",
    "rag_pipeline.add_component(\"prompt_builder\", ChatPromptBuilder(template=template))\n",
    "rag_pipeline.add_component(\"generator\", OpenAIChatGenerator(model=\"gpt-4o-mini\"))\n",
    "rag_pipeline.add_component(\"answer_builder\", AnswerBuilder())\n",
    "\n",
    "rag_pipeline.connect(\"query_embedder\", \"retriever.query_embedding\")\n",
    "rag_pipeline.connect(\"retriever\", \"prompt_builder.documents\")\n",
    "rag_pipeline.connect(\"prompt_builder.prompt\", \"generator.messages\")\n",
    "rag_pipeline.connect(\"generator.replies\", \"answer_builder.replies\")\n",
    "rag_pipeline.connect(\"retriever\", \"answer_builder.documents\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "DBAyF5tVO-qc"
   },
   "source": [
    "### Asking a Question\n",
    "\n",
    "When asking a question, use the `run()` method of the pipeline. Make sure to provide the question to all components that require it as input. In this case these are the `query_embedder`, the `prompt_builder` and the `answer_builder`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 86,
     "referenced_widgets": [
      "f92db6040c414987acb2e855b5efe29a",
      "a5e5e17fa90941c8820424bcb8a64461",
      "f42b858a75334567b6e21d4491d8bf9a",
      "2451c140c32f46f68fa59d949b198c81",
      "927fd891c86f4098bb4b32717231d547",
      "997e1beb4a714368b583b812698db5bd",
      "e8419341d2c94b2c879671184da63da1",
      "5f56cc1fdb13480fae8838f613b79d9e",
      "d2b53a8ababf4744beca8bae4abea6ca",
      "076a7c8e3a7747ec928a0d5853e92e88",
      "298f01846fd4442cbcea6149a74c979d"
     ]
    },
    "id": "Vnt283M5O-qc",
    "outputId": "4c8f1c3e-d8c0-4d1c-d336-09df4b70544d"
   },
   "outputs": [],
   "source": [
    "question = \"Do high levels of procalcitonin in the early phase after pediatric liver transplantation indicate poor postoperative outcome?\"\n",
    "\n",
    "response = rag_pipeline.run(\n",
    "    {\n",
    "        \"query_embedder\": {\"text\": question},\n",
    "        \"prompt_builder\": {\"question\": question},\n",
    "        \"answer_builder\": {\"query\": question},\n",
    "    }\n",
    ")\n",
    "print(response[\"answer_builder\"][\"answers\"][0].data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Bl6Q0PyoDV-E"
   },
   "source": [
    "## Evaluate the Pipeline\n",
    "\n",
    "For this tutorial, let's evaluate the pipeline with the following metrics:\n",
    "\n",
    "- [Document Mean Reciprocal Rank](https://docs.haystack.deepset.ai/docs/documentmrrevaluator): Evaluates retrieved documents using ground truth labels. It checks at what rank ground truth documents appear in the list of retrieved documents.\n",
    "- [Semantic Answer Similarity](https://docs.haystack.deepset.ai/docs/sasevaluator): Evaluates predicted answers using ground truth labels. It checks the semantic similarity of a predicted answer and the ground truth answer using a fine-tuned language model.\n",
    "- [Faithfulness](https://docs.haystack.deepset.ai/docs/faithfulnessevaluator): Uses an LLM to evaluate whether a generated answer can be inferred from the provided contexts. Does not require ground truth labels.\n",
    "\n",
    "\n",
    "Firt, let's actually run our RAG pipeline with a set of questions, and make sure we have the ground truth labels (both answers and documents) for these questions. Let's start with 25 random questions and labels 👇\n",
    "\n",
    "> 📝 **Some Notes:**\n",
    "> 1. For a full list of available metrics, check out the [Haystack Evaluators](https://docs.haystack.deepset.ai/docs/evaluators).\n",
    "> 2. In our dataset, for each example question, we have 1 ground truth document as labels. However, in some scenarios more than 1 ground truth document may be provided as labels. You will notice that this is why we provide a list of `ground_truth_documents` for each question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "id": "qgOwnuQLMKLk"
   },
   "outputs": [],
   "source": [
    "import random\n",
    "\n",
    "questions, ground_truth_answers, ground_truth_docs = zip(\n",
    "    *random.sample(list(zip(all_questions, all_ground_truth_answers, all_documents)), 25)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6SO8oX7mMaGC"
   },
   "source": [
    "Next, let's run our pipeline and make sure to track what our pipeline returns as answers, and which documents it retrieves:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000,
     "referenced_widgets": [
      "baef63eb04e8453bb5b935e953cb38c3",
      "e8579020a4d943fea4e14db850568d5e",
      "8a5541de336e4b22a6deaab0e9584dd9",
      "296b3b2a52e940a3affd84442123c593",
      "aed4a80296c44616b6b2e0c62dabdc1a",
      "a046126fc7ba44bbb4da9cffc1fce3cd",
      "4eb8aa248bc4470793b299a666715a6c",
      "99349bdb437a44119ee014d128699b67",
      "ca85b2e9196e4c07ae78b4064ab74020",
      "0a776dd9df294ee2ab86d97083379359",
      "d758563558b1443fb518a927aa2b5987",
      "78fe7cd499bb461088be730ba13fc50d",
      "430165cb3eb04e239fa6a4b30bc3ad43",
      "6bbecbb19efd458ba594a02b5b97acd1",
      "479845d964264ea5a6db8290adce412e",
      "7ad98ef2d0fb4d2f9eef1e0805d47e91",
      "57ebb5eff6254aba9e69d3c231484fdb",
      "7469ae3fe8694b309f0314e4293c308b",
      "93b377dbcd86485ca1512164dda83462",
      "e6824fb24e29447cae1f8e31b74f2e9e",
      "56bc5ddc248b4db782829657cea9f665",
      "8455c81dc7364dab885a178f59054e67",
      "f44587fa7ba24a98bfd3c321ee63a931",
      "a9eafb1df2b54af29547d22d74a8bc58",
      "53474ffa2ae44868807c8ffbad7d23d0",
      "7b68a4e12020490989e11c03d5a26bc8",
      "e4059787ca0344d0b85b2159c94f3d6b",
      "f216b75c5243465a9b702d159cea76bd",
      "495d42da32074cd384839da287824beb",
      "58e8d9cfe5944321a772aa78fc84bafa",
      "9242bb1decfd4503a4b0568039df9769",
      "245f816048c744ea8e8b631cede03dd3",
      "1006284a46bc4cc98c56cd339b8c0284",
      "c94e26dd87544b5f960fe1eb06621616",
      "3bac442ea3be41c78e718382706aa7b2",
      "28d03c76413940e986b2b443e5915444",
      "f0e9d09827cb4458a83394ab6c4c6c71",
      "838f2e4c240d4037b528a29caea32852",
      "df2fe7969c6543e9974e7e010c073962",
      "e32c8934dd2f4a4ba0874809f3278ff9",
      "422e6f1776f0485fa42666cbc023e389",
      "1685d1e6304d4e2c90c06a6c90651d17",
      "545346eb5bfa46f4bc430460db80998f",
      "2e93390ecaa14e998f5fa4e804a883f6",
      "474e5bcffcb84de78c4ec36fc7ba16f2",
      "19249b91d76a412c9c0e3cf288eea043",
      "a84f51200eec4b3c81849e925bb13fd0",
      "e0cdc1b88470469f9a57069d6930fe3c",
      "f1e8895226a544d6b20980df2c03215e",
      "0db7023aaf6e4f6bae109dabbde02390",
      "4b1dd13df2fb41b4b5c31e4f776592dd",
      "61fea60bb72d44fb9090432c9697c86f",
      "69338a12833a4a8db0d5667288811720",
      "ddb17e8fb7d34aa3bf13b0d3af446c4e",
      "42acba2e3d8546fc9882f292f50eb52b",
      "d2165726ba2c4900bc76a2ee93dd9acb",
      "bec9485398c14bda90dec041d177d4c8",
      "d4c261229a1444909fb4b8cf39d0331a",
      "0bdd2e0fb3b741c3bb76d4f594e850ad",
      "79cb2112b78f4c1aaf44777ff94d0eb8",
      "620ab6fb70c248d5ad196e40cdb9eaa1",
      "0fd37ec9db0d4d85bea661341f9ace7b",
      "2e331cd71ff54a2b8088908017db891c",
      "f3b88654c10446b5b694840bab8f6e13",
      "da720a683cc24d2e8fbb0993367feb76",
      "7a3a0f2fbd854e97a53f95f32477d454",
      "9d4ab8f056d04a17880d6ddfcb8836c4",
      "bc66d35c5f454d829b8e78402e7c3489",
      "1965743a8fa344278313fe8a2c313284",
      "14883a8fae204514a947c8474c1ac8f5",
      "e83429f7fb2b4a71bf5018d61830a9af",
      "74a099ea11e44f64b390e32ef06b5246",
      "c3623a83b8ca431d8af7a405a1b4ebb2",
      "20793f1d11974dfc8b217677ad41c693",
      "2936a55c2458436788f0b7204ba342d6",
      "fb32eb9ae7f048678e8492f05731d5a5",
      "ab28501c63e8435c8e7d5b0d410b48f1",
      "a3de895bf14f4161b084e1546477b4eb",
      "65603d7d1fc745ab92a8fe3ed995c45c",
      "729c3dcf3b7e46938c16c2bd44b133e7",
      "d9c11d8775434dd895ea08443511e97f",
      "4c2ea3f993664cc2bbf48a1f6334611a",
      "921d1e2ab03e44ca8a21ccb816bc2c3b",
      "fb3f8f987431449d8c16e6daec93e5e1",
      "178af9ca4ce749869f350fa20bb3cf0e",
      "faf7a64ee72347a3a78766d78b7f2c3a",
      "10f2832a99b74dfabdc3fecb65665886",
      "bb746fec56ea42709140cef5373e3e69",
      "b07451ecd6d945469905b9b73a1b931f",
      "b50fbd86f25e42d299cee17c02a289ad",
      "e2c8600e4ea2498f9336248f2b9dc61f",
      "e4537c3b585d4843a0faac6b10f78b5a",
      "c7b225a2c5f84fe7920c4a00b59bf285",
      "f53bfb35d9d7481f9833298db5f10e51",
      "affa93e57aef4c34800b9793200ed9a5",
      "863225868c5440d0b1114c4f545ff995",
      "990f18fa75064930b98912346bc3c43b",
      "4fef896081a44806aa49cdb573007072",
      "9d540bd25b2d4db1be3ae84b17389143",
      "87610ad009084c1191c57b028f6bdd6a",
      "e6376e35ae3b4e248e9054da78910a3b",
      "f7ce6380f0a54a06808a12bcf8c2619f",
      "f89b7808492b40268c52d3a010897214",
      "37ce4f09062248348a27d7d420a8e2a7",
      "f15c2cfe46414616b904ec6db74b66cb",
      "88f4f39867d647bc9142ef28116ece30",
      "f11a295a5c6d411fa76859da35cf9b1e",
      "03d24ff73fa64fc294f7ab9f0a120d14",
      "c059f35e313a4e36ad3f40aade661984",
      "7b9f2fb316d3442fb9fe3f1945575e66",
      "0d947864ef3a46ea8b95a89dbfe1244a",
      "f70801f5fb9746c2a94d109cd1edeae9",
      "18cba57e9aa94d15b75a0d5d5408a745",
      "bd219243e4054f588266f55d304e339c",
      "8d4d4c5bb6d34ebeb751a11dc274252f",
      "7898ed6cbd1b4ee5a437e445073cbc4d",
      "4eec0bc861334e7baaa91c11fe6fb540",
      "c741aa5d9e92480a921ae64c34fd4d6f",
      "ee52661a595d49a0a11d60f2b47318d3",
      "819616d9a2d6448dac78126ea6f59dc9",
      "2806d4622f6e453bae43dc6c4efc080e",
      "ab4bff71c4224b20b78d0bd20648e723",
      "ebb4cd0480a64678a6f87a1196e8fdb0",
      "d6a6d6fb9e2c47b8bf12826e439ff420",
      "aabd92b0b1f2487d8bf34092acf246b6",
      "3c0f1448036241028655b73502f281d1",
      "88545f223e80400eace5980396e22ea2",
      "73ccc7b0c25149adb98c68e13c69689f",
      "e968c7b47ff5471a9edba395a8c20a72",
      "dc2d285e40a74a25bc01d35287acd16c",
      "c229366676ad44d48d5cce216415da23",
      "622ff1ac3c8544fa912a5be163ade88d",
      "7cee12908a374d0d8de02d8cf4954d61",
      "0d704ae495034e64a7b9a0436062d480",
      "4ec136c5948b40fbb143f90d76619f09",
      "4827161f865e470ba4a6edee96467a28",
      "4b2765ae8a554fd896b24be3f45f3199",
      "bbf8b4cd086745f096cd587be3c62dae",
      "de4aba756a57412c8764c2d4fa1f1add",
      "ac99872fb57b4341be5f7201ff76d41f",
      "62c2dd34556f4f42b0feb4b4e906b287",
      "4f4fd7960edf4d94a20ec8c070a60913",
      "b1998e4989f84222b488c7f6e7fc60f6",
      "2df0f6a05652407caedd1be9d3091db3",
      "d4b8388433e1416895a614689d286286",
      "875e5d977f194852aa40f97ac5146728",
      "bc36b14706754f59a83ad0829e057d77",
      "dcd1eb326e7b491681dc5dc0c10b6899",
      "ec91994b1d3345b7b4a3207e82c09911",
      "a890577f68b7442c9c8de5d91efba57b",
      "e29d2da07f98488eb188428b70bbc1f0",
      "82abad199e9343a4b199eaeeeafa82fc",
      "83c2c4b1bb2b40b0af22ec3caf8ac9e6",
      "3c34a99aee044bc381f5a4d40457930a",
      "b87f7df83e974a109bb3ac5e638063b6",
      "a23ac915d7624ff89dbc978da109f14f",
      "f851ee98a90a434bba1c77182db795e6",
      "1c98903437d8404ba7e810cad8155484",
      "469a09afe2e243e181203deaeec9a2e6",
      "e15e48964be04515a46d9dcb7bdf66d2",
      "4d4b3e1d2a9a4df2803ba04ec07ac680",
      "05f826fb77b847b788e31b1a01327825",
      "9be0879c1755459cb2b6dfd5325f296d",
      "c117dff6505846c88638739791ae7891",
      "c89ec4586049486c9b7747b9a8deb610",
      "c3311fa8553b4e8592e0a52cac8f7ca4",
      "2f1716e86bcf4e86bfb0ba6829b2d1c8",
      "e532a00bf56d4c62b401a157bdf36ce2",
      "fee5102fa2094605be76dac0d9f0a280",
      "ab956c8ccb5b478e96ba82549a840303",
      "1dcc491b740440c3b2c6a35c27244ad0",
      "f7705bbe8e814f619953599b37d79383",
      "4b9f297358b94400abc62260fae17c5f",
      "9f3f5df401c340b69474bb2b62fcc7c4",
      "bf6cf1d170be461f9d1d2de83ae97f26",
      "25b91847e6c34bf8b9bdc2fff836d1c9",
      "17da0b1a782f4499b2f8e3e8e35133f3",
      "69799dca845d43c8bfcb779a7c7729ca",
      "7a2ad154686f486b9048aa809ba63de8",
      "5f156579130b4f21aa3579d4921e3a1a",
      "651e641315d5457daafb9a25635a8687",
      "c4675fc8ab6a4829b1a08670bdb1cf5a",
      "51552796653a474dbaa852cb417e3c1a",
      "462d28fd4c4145eaae47c2d40394c865",
      "a3bb49dda9624b3d8a42c3fd6b96fdfc",
      "f1ba2ebd6fdb4691964cfe148e0c2d5b",
      "779c1c002bbc46609f6d0f658a7bc762",
      "660a0c8430f24839ab363446150dcd7b",
      "f5dab6e2c0bd409e83665baa2911771a",
      "66c94f5f89fe4904820d526481df87a5",
      "7d24d5a719034c9db3f7bd39d7c3995a",
      "e8a0b1b713324dfa870199003b03a729",
      "2ed10f82f28041b1afed2d945592e317",
      "6928093c16eb4aa881079be40f44e2ba",
      "5f0aeababb8d4a34bf5f23cc11e4471b",
      "53759ccb997e4c6483e733f5795fadba",
      "783316906906422f8fe822604b3e723e",
      "f3e5d9b12254417eb16ba9d4ac1a8cf4",
      "b5d3a873bed04e8cbd1852444fa05030",
      "0471cf2581284ece894d97804f5ea9bb",
      "806b23e6aff84c21a38c95121d25fc5b",
      "539e26abdfc0480b97a2e9c79978fa9d",
      "c1e19a7966154663878f45cdcf304065",
      "28ddf214a114428eaf400f4a6ba01d57",
      "f50d18e8a4c84c9181185c9bfcab663b",
      "e09a869990294297b872f8d815dafd31",
      "ab659e2bfb924fd281a7027bf27f80ca",
      "f91bd363ba76454e907dbbc57149866d",
      "7cdcf16d0f2544ccb0755dde35a7fa12",
      "4ecbd5b99db641488e4b9b6001c2b1ec",
      "5e3bf4459e184ec889db5c63f7902c63",
      "17af3efa57cf4b5b939658da3041dfba",
      "b4296dc078c741f281662cd8abb4e7a8",
      "c51b87fef25d4827b5919da47622f48c",
      "52cfc047b96b43e1a2346b09c114d469",
      "b68ad80263f64c2fad81a61a6c8d46c2",
      "1cef82cca6964f9694583a9fc5a2dc45",
      "8ccf0a115d9e424d81e20e3934976745",
      "75e3cf8b7d7343d2adab1cff4123ee53",
      "78ea2db30fbf4ccc9b9fe48b99694b28",
      "ad2e9edf0dc3472d9dc0639b07328638",
      "10c42beece104b6ea6517ae4f3142a07",
      "1fd5340df5744db9ba1b78f5d5422f5e",
      "70f4a828b4034796b8a5ad0317a668b8",
      "a09f5b5cd66e45fc9e3d5ce7d2c83588",
      "f9cf7fde2d934193b27b3e59ea2e35be",
      "da16b1c28b664c0db8915ea99b06b736",
      "bc3372c7031a4fa5b1860161fca20c54",
      "2d341eb144bd467b938e1b5843bd0b10",
      "58f1edb7a87c4bb5b1c04bed26a2b5ea",
      "0eacc908491f4e37bf0c67e0089f3124",
      "62da147141b84d56a593db7cf8472831",
      "2ed099c50fe14bec907fc2b8bd3c9da9",
      "2864f60673e941e58d951239bfb19a56",
      "94bfff2d2db0402394be089fee8b9ef8",
      "e2c43ba2caae4a7ab922d17c75bc7d29",
      "f75e25a5ea3b47c0a24cccba0ed727ac",
      "4e2438c533f1485ab21c8687c884e962",
      "4965b8d324d541a6afa91e1859158a69",
      "effaa092e8634774b5ff9599d64d6899",
      "d13c45b5582c404bb7fb53928d7f1703",
      "ef99e8897a2c4e68a33fe44946ecfb6a",
      "29f9b903c4b547edb70ea5e68b845ff0",
      "d73c12f665fe4293a74df23ce3773beb",
      "5452d62429074c2f95b230cb2263f470",
      "97b3ffcf78c5446d8e86889b8f09982f",
      "b6fe618589824a588550961212161e1d",
      "db8288588eca4bb684e1c9936199c82a",
      "585e47dfd8d243c5b0eadbb4ba467751",
      "ef790ec2140c41d2a3d2bdc4133e8f07",
      "554bac9d33864aa48c058a8f6af8f8ca",
      "eaba6362be114e2bba0f41c77f518d67",
      "6e1effb30848410db27a76cfe84dd10e",
      "b3e515210b6b4369aaeb267c20ffe456",
      "5cb20c9cc48d432487dff75e1bb80509",
      "b18fd11ada8d411d89aacf3a00f70d0e",
      "59fd997682f04a9dbff8903ba62a4468",
      "062e65adab0947b78eeacb3612df7e48",
      "4b27469056ea4519a210ae26cbbf1a49",
      "c1dece3da6ed41d5a63d417951e39236",
      "b865a1dc71604c15b6af463ded813df8",
      "30419733e78145e1a4840eb62da59c2e",
      "90325d083a6b4785a29eb1448a7b4bdc",
      "94ed64f2570542e5a98829ad330d366e",
      "14ab52345e1b4db1a442ebda71c61e91",
      "26a27f41511b4e47b5d7cc62d9deaab7",
      "38b9bf9c0cf04c60872bc71526964ca8",
      "e655bb7681cc4e4cbd576030ae2bbe6a",
      "a36e94ca468f4023bac51658e4e18921",
      "6895c13a356c4f0a9a36b25502d1c4e6",
      "cc871b3b464745948723728877f21d2b",
      "95f5805b2a0f4e0e85c8ca9191ee1eeb",
      "9cb3f6ce1ad04df2b08d958428f840a7",
      "5c550cd4b0494581bba13ee0f1cb60a4",
      "58a04a2db7644033babb02e6dd83b7bf"
     ]
    },
    "id": "SknPWiKQMZpy",
    "outputId": "8b06916b-bffe-498b-9d03-e4831d97aa40"
   },
   "outputs": [],
   "source": [
    "rag_answers = []\n",
    "retrieved_docs = []\n",
    "\n",
    "for question in list(questions):\n",
    "    response = rag_pipeline.run(\n",
    "        {\n",
    "            \"query_embedder\": {\"text\": question},\n",
    "            \"prompt_builder\": {\"question\": question},\n",
    "            \"answer_builder\": {\"query\": question},\n",
    "        }\n",
    "    )\n",
    "    print(f\"Question: {question}\")\n",
    "    print(\"Answer from pipeline:\")\n",
    "    print(response[\"answer_builder\"][\"answers\"][0].data)\n",
    "    print(\"\\n-----------------------------------\\n\")\n",
    "\n",
    "    rag_answers.append(response[\"answer_builder\"][\"answers\"][0].data)\n",
    "    retrieved_docs.append(response[\"answer_builder\"][\"answers\"][0].documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "dZSvAl9fZAQ1"
   },
   "source": [
    "While each evaluator is a component that can be run individually in Haystack, they can also be added into a pipeline. This way, we can construct an `eval_pipeline` that includes all evaluators for the metrics we want to evaluate our pipeline on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "5Z4to-q3OxSV"
   },
   "outputs": [],
   "source": [
    "from haystack.components.evaluators.document_mrr import DocumentMRREvaluator\n",
    "from haystack.components.evaluators.faithfulness import FaithfulnessEvaluator\n",
    "from haystack.components.evaluators.sas_evaluator import SASEvaluator\n",
    "\n",
    "eval_pipeline = Pipeline()\n",
    "eval_pipeline.add_component(\"doc_mrr_evaluator\", DocumentMRREvaluator())\n",
    "eval_pipeline.add_component(\"faithfulness\", FaithfulnessEvaluator())\n",
    "eval_pipeline.add_component(\"sas_evaluator\", SASEvaluator(model=\"sentence-transformers/all-MiniLM-L6-v2\"))\n",
    "\n",
    "results = eval_pipeline.run(\n",
    "    {\n",
    "        \"doc_mrr_evaluator\": {\n",
    "            \"ground_truth_documents\": list([d] for d in ground_truth_docs),\n",
    "            \"retrieved_documents\": retrieved_docs,\n",
    "        },\n",
    "        \"faithfulness\": {\n",
    "            \"questions\": list(questions),\n",
    "            \"contexts\": list([d.content] for d in ground_truth_docs),\n",
    "            \"predicted_answers\": rag_answers,\n",
    "        },\n",
    "        \"sas_evaluator\": {\"predicted_answers\": rag_answers, \"ground_truth_answers\": list(ground_truth_answers)},\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0rIL855wZa6C"
   },
   "source": [
    "### Constructing an Evaluation Report\n",
    "\n",
    "Once we've run our evaluation pipeline, we can also create a full evaluation report. Haystack provides an `EvaluationRunResult` which we can use to display an `aggregated_report` 👇"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 143
    },
    "id": "ZO5jzX7uQi0i",
    "outputId": "cb3f6554-47f9-47f9-ce2d-a7b6aeda2236"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'metrics': ['doc_mrr_evaluator', 'faithfulness', 'sas_evaluator'],\n",
       " 'score': [1.0, np.float64(1.0), np.float64(0.731985309123993)]}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from haystack.evaluation.eval_run_result import EvaluationRunResult\n",
    "\n",
    "inputs = {\n",
    "    \"question\": list(questions),\n",
    "    \"contexts\": list([d.content] for d in ground_truth_docs),\n",
    "    \"answer\": list(ground_truth_answers),\n",
    "    \"predicted_answer\": rag_answers,\n",
    "}\n",
    "\n",
    "evaluation_result = EvaluationRunResult(run_name=\"pubmed_rag_pipeline\", inputs=inputs, results=results)\n",
    "evaluation_result.aggregated_report()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "q8rvbj5rZsW9"
   },
   "source": [
    "#### Extra: You can also see a detailed report with the scores for each sample in your dataset, and we will choose the output format as DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "id": "P0hxWyTMTsbq",
    "outputId": "3e5693dc-10a3-4ad5-a630-f2da0085db7d"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>contexts</th>\n",
       "      <th>answer</th>\n",
       "      <th>predicted_answer</th>\n",
       "      <th>doc_mrr_evaluator</th>\n",
       "      <th>faithfulness</th>\n",
       "      <th>sas_evaluator</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Does increased Syk phosphorylation lead to ove...</td>\n",
       "      <td>[Activation of B cells is a hallmark of system...</td>\n",
       "      <td>Our results suggest that the activated Syk-med...</td>\n",
       "      <td>Yes, increased Syk phosphorylation is associat...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.737820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Do rotavirus vaccines contribute towards unive...</td>\n",
       "      <td>[To evaluate rotavirus vaccination in Malaysia...</td>\n",
       "      <td>We propose that universal vaccination compleme...</td>\n",
       "      <td>Yes, rotavirus vaccines contribute towards uni...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.597597</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Are hospitalisation costs for infant bronchiol...</td>\n",
       "      <td>[Up to 3% of infants with bronchiolitis under ...</td>\n",
       "      <td>The hospitalisation costs of infants treated i...</td>\n",
       "      <td>Yes, the hospitalisation costs for infant bron...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.843995</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Does severe nutritional risk predict decreased...</td>\n",
       "      <td>[Weight loss and malnutrition are poorly toler...</td>\n",
       "      <td>Severe nutritional risk can be a useful predic...</td>\n",
       "      <td>Yes, severe nutritional risk predicts decrease...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.747795</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Is cognitive reserve a determinant of health-r...</td>\n",
       "      <td>[Covert hepatic encephalopathy (CHE) is associ...</td>\n",
       "      <td>A higher cognitive reserve is associated with ...</td>\n",
       "      <td>Yes, cognitive reserve is a significant determ...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.838117</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Does the CC-genotype of the cyclooxygenase-2 g...</td>\n",
       "      <td>[The cyclooxygenase-2 (cox-2) pathway is now r...</td>\n",
       "      <td>We conclude that the CC-genotype and C allele ...</td>\n",
       "      <td>Yes, the CC-genotype of the cyclooxygenase-2 g...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.821029</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Does preoperative anemia increase postoperativ...</td>\n",
       "      <td>[Preoperative anemia may affect postoperative ...</td>\n",
       "      <td>Preoperative anemia in elective cranial neuros...</td>\n",
       "      <td>Yes, preoperative anemia increases postoperati...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.909551</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Does ketamine inhibit proliferation of neural ...</td>\n",
       "      <td>[Ketamine is a widely used anesthetic in obste...</td>\n",
       "      <td>Ketamine inhibited proliferation of NSCs from ...</td>\n",
       "      <td>Yes, ketamine significantly inhibited the prol...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.749711</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Does metoclopramide unmask potentially mislead...</td>\n",
       "      <td>[As metoclopramide stimulates aldosterone secr...</td>\n",
       "      <td>Metoclopramide does not enhance lateralization...</td>\n",
       "      <td>Yes, metoclopramide can unmask potentially mis...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.728780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Is persistent arm pain distinct from persisten...</td>\n",
       "      <td>[Persistent pain following breast cancer surge...</td>\n",
       "      <td>For persistent breast and arm pain, changes in...</td>\n",
       "      <td>Yes, persistent arm pain is distinct from pers...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.904209</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Are alcohol mixed with energy drinks robustly ...</td>\n",
       "      <td>[Young adults are a population at great risk f...</td>\n",
       "      <td>AmED consumption in the past month is robustly...</td>\n",
       "      <td>Yes, alcohol mixed with energy drinks (AmED) c...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.836559</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Is fibromyalgia associated with coronary heart...</td>\n",
       "      <td>[We examined whether patients with a diagnosis...</td>\n",
       "      <td>An association between fibromyalgia and CHD ap...</td>\n",
       "      <td>Yes, the study indicates that patients with fi...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.731234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Do patient-oncologist alliance and psychosocia...</td>\n",
       "      <td>[Patient-oncologist alliance and psychosocial ...</td>\n",
       "      <td>Stronger patient-oncologist alliance may foste...</td>\n",
       "      <td>Yes, patient-oncologist alliance and psychosoc...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.848075</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Does global cortical thinning in acute anorexi...</td>\n",
       "      <td>[Anorexia nervosa (AN) is a serious eating dis...</td>\n",
       "      <td>Structural brain anomalies in AN as expressed ...</td>\n",
       "      <td>Yes, global cortical thinning in acute anorexi...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.413406</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Does dual-specificity phosphatase 6 predict th...</td>\n",
       "      <td>[We previously found that Dual-specificity pho...</td>\n",
       "      <td>Dusp6 could be a predicting marker for decidin...</td>\n",
       "      <td>Yes, dual-specificity phosphatase 6 (Dusp6) ex...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.713987</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Does miR-219-5p modulate cell growth of papill...</td>\n",
       "      <td>[Papillary thyroid carcinoma (PTC) is the most...</td>\n",
       "      <td>Our investigation identified miR-219-5p as a n...</td>\n",
       "      <td>Yes, miR-219-5p modulates cell growth of papil...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.673704</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Is hospital readmission associated with poor s...</td>\n",
       "      <td>[Hospital readmissions are costly and associat...</td>\n",
       "      <td>Hospital readmissions after esophagectomy for ...</td>\n",
       "      <td>Yes, hospital readmission is associated with p...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.864765</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Do a critical analysis of secondary overtriage...</td>\n",
       "      <td>[Trauma centers often receive transfers from l...</td>\n",
       "      <td>A significant number of patients transferred t...</td>\n",
       "      <td>The analysis of secondary overtriage to a Leve...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.604916</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Does rAMP1 suppress mucosal injury from dextra...</td>\n",
       "      <td>[Calcitonin gene-related peptide (CGRP) is tho...</td>\n",
       "      <td>The findings of this study suggest that RAMP1 ...</td>\n",
       "      <td>The provided context does not contain any info...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.755639</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Does early-life stress selectively affect gast...</td>\n",
       "      <td>[Early-life stress and a genetic predispositio...</td>\n",
       "      <td>Our data suggest that early-life stress, on th...</td>\n",
       "      <td>Yes, early-life stress selectively affects gas...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.584665</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Does housing temperature influence the pattern...</td>\n",
       "      <td>[Researchers studying the murine response to s...</td>\n",
       "      <td>Taken together, these data show that housing t...</td>\n",
       "      <td>Yes, housing temperature influences the patter...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.761145</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Does ginsenoside Rg3 improve erectile function...</td>\n",
       "      <td>[Ginsenoside Rg3 is one of the active ingredie...</td>\n",
       "      <td>Oral gavage with Rg3 appears to both prevent d...</td>\n",
       "      <td>Yes, ginsenoside Rg3 improves erectile functio...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.471996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Are hair Cortisol Concentrations in Adolescent...</td>\n",
       "      <td>[In anorexia nervosa (AN) hypercortisolism has...</td>\n",
       "      <td>We find lower HCC in AN, compared to HC and PC...</td>\n",
       "      <td>Yes, hair cortisol concentrations (HCC) in ado...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.713652</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Is radiofrequency ablation a thyroid function-...</td>\n",
       "      <td>[To evaluate the efficacy and safety of radiof...</td>\n",
       "      <td>RF ablation improves cosmetic problems and sym...</td>\n",
       "      <td>Yes, radiofrequency ablation is a thyroid func...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.806365</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Does damage Patterns at the Head-Stem Taper Ju...</td>\n",
       "      <td>[Material loss at the taper junction of metal-...</td>\n",
       "      <td>These material loss maps allow us to suggest d...</td>\n",
       "      <td>Yes, the damage patterns at the head-stem tape...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.640922</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             question  \\\n",
       "0   Does increased Syk phosphorylation lead to ove...   \n",
       "1   Do rotavirus vaccines contribute towards unive...   \n",
       "2   Are hospitalisation costs for infant bronchiol...   \n",
       "3   Does severe nutritional risk predict decreased...   \n",
       "4   Is cognitive reserve a determinant of health-r...   \n",
       "5   Does the CC-genotype of the cyclooxygenase-2 g...   \n",
       "6   Does preoperative anemia increase postoperativ...   \n",
       "7   Does ketamine inhibit proliferation of neural ...   \n",
       "8   Does metoclopramide unmask potentially mislead...   \n",
       "9   Is persistent arm pain distinct from persisten...   \n",
       "10  Are alcohol mixed with energy drinks robustly ...   \n",
       "11  Is fibromyalgia associated with coronary heart...   \n",
       "12  Do patient-oncologist alliance and psychosocia...   \n",
       "13  Does global cortical thinning in acute anorexi...   \n",
       "14  Does dual-specificity phosphatase 6 predict th...   \n",
       "15  Does miR-219-5p modulate cell growth of papill...   \n",
       "16  Is hospital readmission associated with poor s...   \n",
       "17  Do a critical analysis of secondary overtriage...   \n",
       "18  Does rAMP1 suppress mucosal injury from dextra...   \n",
       "19  Does early-life stress selectively affect gast...   \n",
       "20  Does housing temperature influence the pattern...   \n",
       "21  Does ginsenoside Rg3 improve erectile function...   \n",
       "22  Are hair Cortisol Concentrations in Adolescent...   \n",
       "23  Is radiofrequency ablation a thyroid function-...   \n",
       "24  Does damage Patterns at the Head-Stem Taper Ju...   \n",
       "\n",
       "                                             contexts  \\\n",
       "0   [Activation of B cells is a hallmark of system...   \n",
       "1   [To evaluate rotavirus vaccination in Malaysia...   \n",
       "2   [Up to 3% of infants with bronchiolitis under ...   \n",
       "3   [Weight loss and malnutrition are poorly toler...   \n",
       "4   [Covert hepatic encephalopathy (CHE) is associ...   \n",
       "5   [The cyclooxygenase-2 (cox-2) pathway is now r...   \n",
       "6   [Preoperative anemia may affect postoperative ...   \n",
       "7   [Ketamine is a widely used anesthetic in obste...   \n",
       "8   [As metoclopramide stimulates aldosterone secr...   \n",
       "9   [Persistent pain following breast cancer surge...   \n",
       "10  [Young adults are a population at great risk f...   \n",
       "11  [We examined whether patients with a diagnosis...   \n",
       "12  [Patient-oncologist alliance and psychosocial ...   \n",
       "13  [Anorexia nervosa (AN) is a serious eating dis...   \n",
       "14  [We previously found that Dual-specificity pho...   \n",
       "15  [Papillary thyroid carcinoma (PTC) is the most...   \n",
       "16  [Hospital readmissions are costly and associat...   \n",
       "17  [Trauma centers often receive transfers from l...   \n",
       "18  [Calcitonin gene-related peptide (CGRP) is tho...   \n",
       "19  [Early-life stress and a genetic predispositio...   \n",
       "20  [Researchers studying the murine response to s...   \n",
       "21  [Ginsenoside Rg3 is one of the active ingredie...   \n",
       "22  [In anorexia nervosa (AN) hypercortisolism has...   \n",
       "23  [To evaluate the efficacy and safety of radiof...   \n",
       "24  [Material loss at the taper junction of metal-...   \n",
       "\n",
       "                                               answer  \\\n",
       "0   Our results suggest that the activated Syk-med...   \n",
       "1   We propose that universal vaccination compleme...   \n",
       "2   The hospitalisation costs of infants treated i...   \n",
       "3   Severe nutritional risk can be a useful predic...   \n",
       "4   A higher cognitive reserve is associated with ...   \n",
       "5   We conclude that the CC-genotype and C allele ...   \n",
       "6   Preoperative anemia in elective cranial neuros...   \n",
       "7   Ketamine inhibited proliferation of NSCs from ...   \n",
       "8   Metoclopramide does not enhance lateralization...   \n",
       "9   For persistent breast and arm pain, changes in...   \n",
       "10  AmED consumption in the past month is robustly...   \n",
       "11  An association between fibromyalgia and CHD ap...   \n",
       "12  Stronger patient-oncologist alliance may foste...   \n",
       "13  Structural brain anomalies in AN as expressed ...   \n",
       "14  Dusp6 could be a predicting marker for decidin...   \n",
       "15  Our investigation identified miR-219-5p as a n...   \n",
       "16  Hospital readmissions after esophagectomy for ...   \n",
       "17  A significant number of patients transferred t...   \n",
       "18  The findings of this study suggest that RAMP1 ...   \n",
       "19  Our data suggest that early-life stress, on th...   \n",
       "20  Taken together, these data show that housing t...   \n",
       "21  Oral gavage with Rg3 appears to both prevent d...   \n",
       "22  We find lower HCC in AN, compared to HC and PC...   \n",
       "23  RF ablation improves cosmetic problems and sym...   \n",
       "24  These material loss maps allow us to suggest d...   \n",
       "\n",
       "                                     predicted_answer  doc_mrr_evaluator  \\\n",
       "0   Yes, increased Syk phosphorylation is associat...                1.0   \n",
       "1   Yes, rotavirus vaccines contribute towards uni...                1.0   \n",
       "2   Yes, the hospitalisation costs for infant bron...                1.0   \n",
       "3   Yes, severe nutritional risk predicts decrease...                1.0   \n",
       "4   Yes, cognitive reserve is a significant determ...                1.0   \n",
       "5   Yes, the CC-genotype of the cyclooxygenase-2 g...                1.0   \n",
       "6   Yes, preoperative anemia increases postoperati...                1.0   \n",
       "7   Yes, ketamine significantly inhibited the prol...                1.0   \n",
       "8   Yes, metoclopramide can unmask potentially mis...                1.0   \n",
       "9   Yes, persistent arm pain is distinct from pers...                1.0   \n",
       "10  Yes, alcohol mixed with energy drinks (AmED) c...                1.0   \n",
       "11  Yes, the study indicates that patients with fi...                1.0   \n",
       "12  Yes, patient-oncologist alliance and psychosoc...                1.0   \n",
       "13  Yes, global cortical thinning in acute anorexi...                1.0   \n",
       "14  Yes, dual-specificity phosphatase 6 (Dusp6) ex...                1.0   \n",
       "15  Yes, miR-219-5p modulates cell growth of papil...                1.0   \n",
       "16  Yes, hospital readmission is associated with p...                1.0   \n",
       "17  The analysis of secondary overtriage to a Leve...                1.0   \n",
       "18  The provided context does not contain any info...                1.0   \n",
       "19  Yes, early-life stress selectively affects gas...                1.0   \n",
       "20  Yes, housing temperature influences the patter...                1.0   \n",
       "21  Yes, ginsenoside Rg3 improves erectile functio...                1.0   \n",
       "22  Yes, hair cortisol concentrations (HCC) in ado...                1.0   \n",
       "23  Yes, radiofrequency ablation is a thyroid func...                1.0   \n",
       "24  Yes, the damage patterns at the head-stem tape...                1.0   \n",
       "\n",
       "    faithfulness  sas_evaluator  \n",
       "0            1.0       0.737820  \n",
       "1            1.0       0.597597  \n",
       "2            1.0       0.843995  \n",
       "3            1.0       0.747795  \n",
       "4            1.0       0.838117  \n",
       "5            1.0       0.821029  \n",
       "6            1.0       0.909551  \n",
       "7            1.0       0.749711  \n",
       "8            1.0       0.728780  \n",
       "9            1.0       0.904209  \n",
       "10           1.0       0.836559  \n",
       "11           1.0       0.731234  \n",
       "12           1.0       0.848075  \n",
       "13           1.0       0.413406  \n",
       "14           1.0       0.713987  \n",
       "15           1.0       0.673704  \n",
       "16           1.0       0.864765  \n",
       "17           1.0       0.604916  \n",
       "18           1.0       0.755639  \n",
       "19           1.0       0.584665  \n",
       "20           1.0       0.761145  \n",
       "21           1.0       0.471996  \n",
       "22           1.0       0.713652  \n",
       "23           1.0       0.806365  \n",
       "24           1.0       0.640922  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results_df = evaluation_result.detailed_report(output_format=\"df\")\n",
    "results_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "TidpbS5NwuIA"
   },
   "source": [
    "Having our evaluation results as a dataframe can be quite useful. For example, below we can use the pandas dataframe to filter the results to the top 3 best scores for semantic answer similarity (`sas_evaluator`) as well as the bottom 3 👇\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 341
    },
    "id": "d6PuFgcnwt1i",
    "outputId": "0dbe0769-f2d9-43e5-8cbb-a0ceea6c8d55"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>question</th>\n",
       "      <th>contexts</th>\n",
       "      <th>answer</th>\n",
       "      <th>predicted_answer</th>\n",
       "      <th>doc_mrr_evaluator</th>\n",
       "      <th>faithfulness</th>\n",
       "      <th>sas_evaluator</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Does nAIF1 inhibit gastric cancer cells migrat...</td>\n",
       "      <td>[Nuclear apoptosis-inducing factor 1 (NAIF1) c...</td>\n",
       "      <td>Our study revealed that NAIF1 plays a role in ...</td>\n",
       "      <td>Yes, NAIF1 inhibits gastric cancer cells migra...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.867813</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Are [ Higher nitric oxide levels associated wi...</td>\n",
       "      <td>[Oxidative stress generated within inflammator...</td>\n",
       "      <td>There are increased levels of nitric oxide in ...</td>\n",
       "      <td>Yes, higher nitric oxide levels are associated...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.853290</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Is loss of N-cadherin associated with loss of ...</td>\n",
       "      <td>[Our previous study suggested that N-cadherin ...</td>\n",
       "      <td>Loss of N-cadherin was positively correlated w...</td>\n",
       "      <td>Yes, loss of N-cadherin is associated with los...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.844145</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Is malonate as a ROS product associated with p...</td>\n",
       "      <td>[The role of anaplerotic nutrient entry into t...</td>\n",
       "      <td>This study extends the interest in the PC acti...</td>\n",
       "      <td>Yes, malonate is associated with pyruvate carb...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.472158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Is depression a consistent syndrome : An inves...</td>\n",
       "      <td>[The DSM-5 encompasses a wide range of symptom...</td>\n",
       "      <td>Symptoms were dichotomized to construct sympto...</td>\n",
       "      <td>Depression is not a consistent syndrome, as th...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.534966</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Does brivaracetam differentially affect voltag...</td>\n",
       "      <td>[Brivaracetam (BRV) is an antiepileptic drug i...</td>\n",
       "      <td>The lack of effect of BRV on SRF in neurons su...</td>\n",
       "      <td>Yes, brivaracetam differentially affects volta...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.551872</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                             question  \\\n",
       "21  Does nAIF1 inhibit gastric cancer cells migrat...   \n",
       "5   Are [ Higher nitric oxide levels associated wi...   \n",
       "15  Is loss of N-cadherin associated with loss of ...   \n",
       "19  Is malonate as a ROS product associated with p...   \n",
       "10  Is depression a consistent syndrome : An inves...   \n",
       "24  Does brivaracetam differentially affect voltag...   \n",
       "\n",
       "                                             contexts  \\\n",
       "21  [Nuclear apoptosis-inducing factor 1 (NAIF1) c...   \n",
       "5   [Oxidative stress generated within inflammator...   \n",
       "15  [Our previous study suggested that N-cadherin ...   \n",
       "19  [The role of anaplerotic nutrient entry into t...   \n",
       "10  [The DSM-5 encompasses a wide range of symptom...   \n",
       "24  [Brivaracetam (BRV) is an antiepileptic drug i...   \n",
       "\n",
       "                                               answer  \\\n",
       "21  Our study revealed that NAIF1 plays a role in ...   \n",
       "5   There are increased levels of nitric oxide in ...   \n",
       "15  Loss of N-cadherin was positively correlated w...   \n",
       "19  This study extends the interest in the PC acti...   \n",
       "10  Symptoms were dichotomized to construct sympto...   \n",
       "24  The lack of effect of BRV on SRF in neurons su...   \n",
       "\n",
       "                                     predicted_answer  doc_mrr_evaluator  \\\n",
       "21  Yes, NAIF1 inhibits gastric cancer cells migra...                1.0   \n",
       "5   Yes, higher nitric oxide levels are associated...                1.0   \n",
       "15  Yes, loss of N-cadherin is associated with los...                1.0   \n",
       "19  Yes, malonate is associated with pyruvate carb...                1.0   \n",
       "10  Depression is not a consistent syndrome, as th...                1.0   \n",
       "24  Yes, brivaracetam differentially affects volta...                1.0   \n",
       "\n",
       "    faithfulness  sas_evaluator  \n",
       "21           1.0       0.867813  \n",
       "5            1.0       0.853290  \n",
       "15           1.0       0.844145  \n",
       "19           1.0       0.472158  \n",
       "10           1.0       0.534966  \n",
       "24           1.0       0.551872  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "top_3 = results_df.nlargest(3, \"sas_evaluator\")\n",
    "bottom_3 = results_df.nsmallest(3, \"sas_evaluator\")\n",
    "pd.concat([top_3, bottom_3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XueCK3y4O-qc"
   },
   "source": [
    "## What's next\n",
    "\n",
    "🎉 Congratulations! You've learned how to evaluate a RAG pipeline with model-based evaluation frameworks and without any labeling efforts.\n",
    "\n",
    "If you liked this tutorial, you may also enjoy:\n",
    "- [Serializing Haystack Pipelines](https://haystack.deepset.ai/tutorials/29_serializing_pipelines)\n",
    "-  [Creating Your First QA Pipeline with Retrieval-Augmentation](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline)\n",
    "\n",
    "To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates). Thanks for reading!"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  },
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
