{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "e4ff7ed5-b796-4739-8db4-57ff8c59dd95",
      "metadata": {
        "id": "e4ff7ed5-b796-4739-8db4-57ff8c59dd95"
      },
      "source": [
        "# Haystack RAG Pipeline with Self-Deployed AI models using NVIDIA NIMs\n",
        "\n",
        "In this notebook, we will build a Haystack Retrieval Augmented Generation (RAG) Pipeline using self-hosted AI models with NVIDIA Inference Microservices or NIMs.\n",
        "\n",
        "The notebook is associated with a technical blog demonstrating the steps to deploy NVIDIA NIMs with Haystack into production.\n",
        "\n",
        "The code examples expect the LLM Generator and Retrieval Embedding AI models already deployed using NIMs microservices following the procedure described in the technical blog.\n",
        "\n",
        "You can also substitute the calls to NVIDIA NIMs with the same AI models hosted by NVIDIA on [ai.nvidia.com](https://ai.nvidia.com).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "a96de91c-8701-4740-8f1f-f7a4522bb5c7",
      "metadata": {
        "id": "a96de91c-8701-4740-8f1f-f7a4522bb5c7",
        "tags": []
      },
      "outputs": [],
      "source": [
        "# install the relevant libraries\n",
        "!pip install haystack-ai\n",
        "!pip install nvidia-haystack\n",
        "!pip install --upgrade setuptools==67.0\n",
        "!pip install pip install pydantic==1.9.0\n",
        "!pip install pypdf\n",
        "!pip install hayhooks\n",
        "!pip install qdrant-haystack"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "01bed90b",
      "metadata": {
        "id": "01bed90b"
      },
      "source": [
        "\n",
        "For the Haystack RAG pipeline, we will use the Qdrant Vector Database and the self-hosted [*meta-llama3-8b-instruct*](https://build.nvidia.com/explore/discover#llama3-70b) for the LLM Generator and [*NV-Embed-QA*](https://build.nvidia.com/nvidia/embed-qa-4) for the embedder.\n",
        "\n",
        "In the next cell, We will set the domain names and URLs of the self-deployed NVIDIA NIMs as well as the QdrantDocumentStore URL. Adjust these according to your setup.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7deffa36-67ef-4388-a54a-d436249e0b9a",
      "metadata": {
        "id": "7deffa36-67ef-4388-a54a-d436249e0b9a",
        "tags": []
      },
      "outputs": [],
      "source": [
        "# Global Variables\n",
        "import os\n",
        "\n",
        "# LLM NIM\n",
        "llm_nim_model_name = \"meta-llama3-8b-instruct\"\n",
        "llm_nim_base_url = \"http://nims.example.com/llm\"\n",
        "\n",
        "# Embedding NIM\n",
        "embedding_nim_model = \"NV-Embed-QA\"\n",
        "embedding_nim_api_url = \"http://nims.example.com/embedding\"\n",
        "\n",
        "# Qdrant Vector Database\n",
        "qdrant_endpoint = \"http://vectordb.example.com:30030\"\n"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8ca10672-60ee-460f-b8d8-697d8463cd5c",
      "metadata": {
        "id": "8ca10672-60ee-460f-b8d8-697d8463cd5c",
        "tags": []
      },
      "source": [
        "## 1. Check Deployments\n",
        "\n",
        "Let's first check the Vector database and the self-deployed models with NVIDIA NIMs in our environment. Have a look at the technical blog for the steps NIM deployment.\n",
        "\n",
        "We can check the AI models deployed with NIMs and the Qdrant database using simple *curl* commands."
      ]
    },
    {
      "cell_type": "markdown",
      "id": "8f37420a-47ac-43d6-b662-71ec7f1ae462",
      "metadata": {
        "id": "8f37420a-47ac-43d6-b662-71ec7f1ae462"
      },
      "source": [
        "### 1.1 Check the LLM Generator NIM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7be5097b-6f85-4d70-917e-63705af4474b",
      "metadata": {
        "id": "7be5097b-6f85-4d70-917e-63705af4474b",
        "outputId": "6c5a38d6-a6b1-4f03-fc04-2bdf22683573",
        "tags": []
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\"object\":\"list\",\"data\":[{\"id\":\"meta-llama3-8b-instruct\",\"object\":\"model\",\"created\":1716465695,\"owned_by\":\"system\",\"root\":\"meta-llama3-8b-instruct\",\"parent\":null,\"permission\":[{\"id\":\"modelperm-6f996d60554743beab7b476f09356c6e\",\"object\":\"model_permission\",\"created\":1716465695,\"allow_create_engine\":false,\"allow_sampling\":true,\"allow_logprobs\":true,\"allow_search_indices\":false,\"allow_view\":true,\"allow_fine_tuning\":false,\"organization\":\"*\",\"group\":null,\"is_blocking\":false}]}]}"
          ]
        }
      ],
      "source": [
        "! curl '{llm_nim_base_url}/v1/models' -H 'Accept: application/json'"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "91e50b63-8a6d-433c-9152-54050c2027d6",
      "metadata": {
        "id": "91e50b63-8a6d-433c-9152-54050c2027d6"
      },
      "source": [
        "### 1.2 Check the Retreival Embedding NIM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c5c7d782-8214-40a9-839f-a2c9c0530bdc",
      "metadata": {
        "id": "c5c7d782-8214-40a9-839f-a2c9c0530bdc",
        "outputId": "a21b2cc1-f288-4c50-e7be-84e7040481b2",
        "tags": []
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\"object\":\"list\",\"data\":[{\"id\":\"NV-Embed-QA\",\"created\":0,\"object\":\"model\",\"owned_by\":\"organization-owner\"}]}"
          ]
        }
      ],
      "source": [
        "! curl '{embedding_nim_api_url}/v1/models' -H 'Accept: application/json'"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0b3aefd2-7801-44a2-a8d9-58dd4d1e8dd7",
      "metadata": {
        "id": "0b3aefd2-7801-44a2-a8d9-58dd4d1e8dd7",
        "tags": []
      },
      "source": [
        "### 1.3 Check the Qdrant Database"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2908943b-b0eb-4648-a6ee-ce861f8f7901",
      "metadata": {
        "id": "2908943b-b0eb-4648-a6ee-ce861f8f7901",
        "outputId": "6b57e567-06cd-4579-de36-a91d2054338b",
        "tags": []
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "{\"title\":\"qdrant - vector search engine\",\"version\":\"1.9.1\",\"commit\":\"97c107f21b8dbd1cb7190ecc732ff38f7cdd248f\"}"
          ]
        }
      ],
      "source": [
        "! curl '{qdrant_endpoint}' -H 'Accept: application/json'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "92fecd94-8241-48b9-9450-17f41c6a4bf1",
      "metadata": {
        "id": "92fecd94-8241-48b9-9450-17f41c6a4bf1",
        "tags": []
      },
      "outputs": [],
      "source": [
        "# A workaround\n",
        "os.environ[\"NVIDIA_API_KEY\"] = \"\""
      ]
    },
    {
      "cell_type": "markdown",
      "id": "eea8780a-8227-4dc9-b658-d9fdd2f1fe08",
      "metadata": {
        "id": "eea8780a-8227-4dc9-b658-d9fdd2f1fe08"
      },
      "source": [
        "## 2. Perform Indexing\n",
        "\n",
        "Let's first initialize the Qdrant vector database, create the Haystack indexing pipeline and upload pdf examples. We will use the self-deployed embedder AI model with NIM."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "d3a8ed26-1e75-492c-8994-8033ca970b8e",
      "metadata": {
        "id": "d3a8ed26-1e75-492c-8994-8033ca970b8e",
        "tags": []
      },
      "outputs": [],
      "source": [
        "# Import the relevant libraries\n",
        "from haystack import Pipeline\n",
        "from haystack.components.converters import PyPDFToDocument\n",
        "from haystack.components.writers import DocumentWriter\n",
        "from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter\n",
        "from haystack_integrations.document_stores.qdrant import QdrantDocumentStore\n",
        "from haystack_integrations.components.embedders.nvidia import NvidiaDocumentEmbedder"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "cea0f80f-9e04-46c3-97d2-42fdf3d462cd",
      "metadata": {
        "id": "cea0f80f-9e04-46c3-97d2-42fdf3d462cd",
        "tags": []
      },
      "outputs": [],
      "source": [
        "# initialize document store\n",
        "document_store = QdrantDocumentStore(embedding_dim=1024, url=qdrant_endpoint)\n",
        "\n",
        "# initialize NvidiaDocumentEmbedder with the self-hosted Embedding NIM URL\n",
        "embedder = NvidiaDocumentEmbedder(\n",
        "    model=embedding_nim_model,\n",
        "    api_url=f\"{embedding_nim_api_url}/v1\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "94814597-3541-4f7c-b2ad-665e7041d289",
      "metadata": {
        "id": "94814597-3541-4f7c-b2ad-665e7041d289",
        "tags": []
      },
      "outputs": [],
      "source": [
        "converter = PyPDFToDocument()\n",
        "cleaner = DocumentCleaner()\n",
        "splitter = DocumentSplitter(split_by='word', split_length=100)\n",
        "writer = DocumentWriter(document_store)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "985aa203-c093-40d9-863e-c166cce46b3a",
      "metadata": {
        "id": "985aa203-c093-40d9-863e-c166cce46b3a",
        "outputId": "dabde258-c5d8-4b0c-89bd-081dc5a9ee8c"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<haystack.core.pipeline.pipeline.Pipeline object at 0x7f267972b340>\n",
              "🚅 Components\n",
              "  - converter: PyPDFToDocument\n",
              "  - cleaner: DocumentCleaner\n",
              "  - splitter: DocumentSplitter\n",
              "  - embedder: NvidiaDocumentEmbedder\n",
              "  - writer: DocumentWriter\n",
              "🛤️ Connections\n",
              "  - converter.documents -> cleaner.documents (List[Document])\n",
              "  - cleaner.documents -> splitter.documents (List[Document])\n",
              "  - splitter.documents -> embedder.documents (List[Document])\n",
              "  - embedder.documents -> writer.documents (List[Document])"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# Create the Indexing Pipeline\n",
        "indexing = Pipeline()\n",
        "indexing.add_component(\"converter\", converter)\n",
        "indexing.add_component(\"cleaner\", cleaner)\n",
        "indexing.add_component(\"splitter\", splitter)\n",
        "indexing.add_component(\"embedder\", embedder)\n",
        "indexing.add_component(\"writer\", writer)\n",
        "\n",
        "indexing.connect(\"converter\", \"cleaner\")\n",
        "indexing.connect(\"cleaner\", \"splitter\")\n",
        "indexing.connect(\"splitter\", \"embedder\")\n",
        "indexing.connect(\"embedder\", \"writer\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "oeTA9H8GN49D",
      "metadata": {
        "id": "oeTA9H8GN49D"
      },
      "source": [
        "We will upload in the vector database a PDF research paper about **ChipNeMo** from NVIDIA, a domain specific LLM for Chip design.\n",
        "The paper is available [here](https://raw.githubusercontent.com/deepset-ai/haystack-cookbook/main/data/rag-with-nims/ChipNeMo.pdf)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "8317d18c-7ce0-4315-ab03-d97871fafe47",
      "metadata": {
        "id": "8317d18c-7ce0-4315-ab03-d97871fafe47",
        "outputId": "c8444bae-fa82-44ef-a9d9-b723b428de3b"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Calculating embeddings: 100%|██████████| 4/4 [00:00<00:00, 15.85it/s]\n",
            "200it [00:00, 1293.90it/s]                        \n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "{'embedder': {'meta': {'usage': {'prompt_tokens': 0, 'total_tokens': 0}}},\n",
              " 'writer': {'documents_written': 108}}"
            ]
          },
          "execution_count": 12,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# Document sources to index for embeddings\n",
        "document_sources = [\"./data/ChipNeMo.pdf\"]\n",
        "\n",
        "# Create embeddings\n",
        "indexing.run({\"converter\": {\"sources\": document_sources}})"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "a094e729-c39f-490f-a44a-0973e11511db",
      "metadata": {
        "id": "a094e729-c39f-490f-a44a-0973e11511db"
      },
      "source": [
        "It is possible to check the Qdrant database deployments through the Web UI. We can check the embeddings stored on the dashboard available [qdrant_endpoint/dashboard](http://vectordb.example.com:30030/dashboard\n",
        ")\n",
        "\n",
        "![](/images/embeddings-1.png)\n",
        "\n",
        "<center><img src=\"https://raw.githubusercontent.com/deepset-ai/haystack-cookbook/main/data/rag-with-nims/embeddings-1.png\"></center>\n",
        "<center><img src=\"https://raw.githubusercontent.com/deepset-ai/haystack-cookbook/main/data/rag-with-nims/embeddings-2.png\"></center>"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0ca7c3d0-c83c-4bca-a893-7f7941a3db1b",
      "metadata": {
        "id": "0ca7c3d0-c83c-4bca-a893-7f7941a3db1b"
      },
      "source": [
        "## 3. Create the RAG Pipeline\n",
        "\n",
        "Let's now create the Haystack RAG pipeline. We will initialize the LLM generator with the self-deployed LLM with NIM."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "cc452a81-537d-4bc9-8fa4-56ebb0fa216f",
      "metadata": {
        "id": "cc452a81-537d-4bc9-8fa4-56ebb0fa216f",
        "tags": []
      },
      "outputs": [],
      "source": [
        "# import the relevant libraries\n",
        "from haystack import Pipeline\n",
        "from haystack.utils.auth import Secret\n",
        "from haystack.components.builders import PromptBuilder\n",
        "from haystack_integrations.components.embedders.nvidia import NvidiaTextEmbedder\n",
        "from haystack_integrations.components.generators.nvidia import NvidiaGenerator\n",
        "from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2202533e-1668-4221-bc39-eadc133350be",
      "metadata": {
        "id": "2202533e-1668-4221-bc39-eadc133350be"
      },
      "outputs": [],
      "source": [
        "# initialize NvidiaTextEmbedder with the self-hosted Embedding NIM URL\n",
        "embedder = NvidiaTextEmbedder(\n",
        "    model=embedding_nim_model,\n",
        "    api_url=f\"{embedding_nim_api_url}/v1\"\n",
        ")\n",
        "\n",
        "# initialize NvidiaGenerator with the self-hosted LLM NIM URL\n",
        "generator = NvidiaGenerator(\n",
        "    model=llm_nim_model_name,\n",
        "    api_url=f\"{llm_nim_base_url}/v1\",\n",
        "    model_arguments={\n",
        "        \"temperature\": 0.5,\n",
        "        \"top_p\": 0.7,\n",
        "        \"max_tokens\": 2048,\n",
        "    },\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "0096c24e-ec66-4fc0-b032-7660a7c5c947",
      "metadata": {
        "id": "0096c24e-ec66-4fc0-b032-7660a7c5c947"
      },
      "outputs": [],
      "source": [
        "retriever = QdrantEmbeddingRetriever(document_store=document_store)\n",
        "\n",
        "prompt = \"\"\"Answer the question given the context.\n",
        "Question: {{ query }}\n",
        "Context:\n",
        "{% for document in documents %}\n",
        "    {{ document.content }}\n",
        "{% endfor %}\n",
        "Answer:\"\"\"\n",
        "prompt_builder = PromptBuilder(template=prompt)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7f1d8554-bcd1-43df-9718-5718f077e6cd",
      "metadata": {
        "id": "7f1d8554-bcd1-43df-9718-5718f077e6cd",
        "outputId": "2cef839a-408b-48c3-bc3e-52c81acca9bc",
        "tags": []
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "<haystack.core.pipeline.pipeline.Pipeline object at 0x7f267956e130>\n",
              "🚅 Components\n",
              "  - embedder: NvidiaTextEmbedder\n",
              "  - retriever: QdrantEmbeddingRetriever\n",
              "  - prompt: PromptBuilder\n",
              "  - generator: NvidiaGenerator\n",
              "🛤️ Connections\n",
              "  - embedder.embedding -> retriever.query_embedding (List[float])\n",
              "  - retriever.documents -> prompt.documents (List[Document])\n",
              "  - prompt.prompt -> generator.prompt (str)"
            ]
          },
          "execution_count": 16,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# Create the RAG Pipeline\n",
        "rag = Pipeline()\n",
        "rag.add_component(\"embedder\", embedder)\n",
        "rag.add_component(\"retriever\", retriever)\n",
        "rag.add_component(\"prompt\", prompt_builder)\n",
        "rag.add_component(\"generator\", generator)\n",
        "\n",
        "rag.connect(\"embedder.embedding\", \"retriever.query_embedding\")\n",
        "rag.connect(\"retriever.documents\", \"prompt.documents\")\n",
        "rag.connect(\"prompt\", \"generator\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "aFPGWw2sQ_2I",
      "metadata": {
        "id": "aFPGWw2sQ_2I"
      },
      "source": [
        "Let's now request the RAG pipeline asking a question about the ChipNemo model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "a5363e9b-4ba9-4ac4-a304-cd13fe99884f",
      "metadata": {
        "id": "a5363e9b-4ba9-4ac4-a304-cd13fe99884f",
        "outputId": "769711a8-a814-4099-a30a-94aec13f2c35",
        "tags": []
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "ChipNeMo is a domain-adapted large language model (LLM) designed for chip design, which aims to explore the applications of LLMs for industrial chip design. It is developed by reusing public training data from other language models, with the intention of preserving general knowledge and natural language capabilities during domain adaptation. The model is trained using a combination of natural language and code datasets, including Wikipedia data and GitHub data, and is evaluated on various benchmarks, including multiple-choice questions, code generation, and human evaluation. ChipNeMo implements multiple domain adaptation techniques, including pre-training, domain adaptation, and fine-tuning, to adapt the LLM to the chip design domain. The model is capable of understanding internal HW designs and explaining complex design topics, generating EDA scripts, and summarizing and analyzing bugs.\n"
          ]
        }
      ],
      "source": [
        "# Request the RAG pipeline\n",
        "question = \"Describe chipnemo in detail?\"\n",
        "result = rag.run(\n",
        "    {\n",
        "        \"embedder\": {\"text\": question},\n",
        "        \"prompt\": {\"query\": question},\n",
        "    }, include_outputs_from=[\"prompt\"]\n",
        ")\n",
        "print(result[\"generator\"][\"replies\"][0])"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "-JD7pabyRgmZ",
      "metadata": {
        "id": "-JD7pabyRgmZ"
      },
      "source": [
        "This notebook shows how to build a Haystack RAG pipeline using self-deployed generative AI models with NVIDIA Inference Microservices (NIMs).\n",
        "\n",
        "Please check the documentation on how to deploy NVIDIA NIMs in your own environment.\n",
        "\n",
        "For experimentation purpose, it is also possible to substitute the self-deployed models with NIMs hosted by NVIDIA at [ai.nvidia.com](https://ai.nvidia.com).\n"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.13"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}