{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hybrid RAG Pipeline with Breakpoints\n",
    "\n",
    "This notebook demonstrates how to setup breakpoints in a Haystack pipeline. In this case, we will set up break points in a hybrid retrieval-augmented generation (RAG) pipeline. The pipeline combines BM25 and embedding-based retrieval methods, then uses a transformer-based reranker and an LLM to generate answers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "pip install haystack-ai>=2.16.0\n",
    "pip install \"transformers[torch,sentencepiece]\"\n",
    "pip install \"sentence-transformers>=3.0.0\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup OpenAI API keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "if \"OPENAI_API_KEY\" not in os.environ:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import Required Libraries\n",
    "\n",
    "First, let's import all the necessary components from Haystack."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack import Document, Pipeline\n",
    "from haystack.components.builders import AnswerBuilder, ChatPromptBuilder\n",
    "from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder\n",
    "from haystack.components.generators.chat import OpenAIChatGenerator\n",
    "from haystack.components.joiners import DocumentJoiner\n",
    "from haystack.components.rankers import TransformersSimilarityRanker\n",
    "from haystack.components.retrievers.in_memory import InMemoryBM25Retriever, InMemoryEmbeddingRetriever\n",
    "from haystack.components.writers import DocumentWriter\n",
    "from haystack.dataclasses import ChatMessage\n",
    "from haystack.document_stores.in_memory import InMemoryDocumentStore\n",
    "from haystack.document_stores.types import DuplicatePolicy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Document Store Initializations\n",
    "\n",
    "Let's create a simple document store with some sample documents and their embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "def indexing():\n",
    "    \"\"\"\n",
    "    Indexing documents in a DocumentStore.\n",
    "    \"\"\"\n",
    "\n",
    "    print(\"Indexing documents...\")\n",
    "\n",
    "    # Create sample documents\n",
    "    documents = [\n",
    "        Document(content=\"My name is Jean and I live in Paris. The weather today is 25°C.\"),\n",
    "        Document(content=\"My name is Mark and I live in Berlin. The weather today is 15°C.\"),\n",
    "        Document(content=\"My name is Giorgio and I live in Rome. The weather today is 30°C.\"),\n",
    "    ]\n",
    "\n",
    "    # Initialize document store and components\n",
    "    document_store = InMemoryDocumentStore()\n",
    "    doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)\n",
    "    doc_embedder = SentenceTransformersDocumentEmbedder(model=\"intfloat/e5-base-v2\", progress_bar=False)\n",
    "\n",
    "    # Build and run the ingestion pipeline\n",
    "    ingestion_pipe = Pipeline()\n",
    "    ingestion_pipe.add_component(instance=doc_embedder, name=\"doc_embedder\")\n",
    "    ingestion_pipe.add_component(instance=doc_writer, name=\"doc_writer\")\n",
    "\n",
    "    ingestion_pipe.connect(\"doc_embedder.documents\", \"doc_writer.documents\")\n",
    "    ingestion_pipe.run({\"doc_embedder\": {\"documents\": documents}})\n",
    "\n",
    "    return document_store"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A Hybrid Retrieval Pipeline\n",
    "\n",
    "Now let's build a hybrid RAG pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def hybrid_retrieval(doc_store):\n",
    "    \"\"\"\n",
    "    A simple pipeline for hybrid retrieval using BM25 and embeddings.\n",
    "    \"\"\"\n",
    "\n",
    "    # Initialize query embedder\n",
    "    query_embedder = SentenceTransformersTextEmbedder(model=\"intfloat/e5-base-v2\", progress_bar=False)\n",
    "\n",
    "    # Define the prompt template for the LLM\n",
    "    template = [\n",
    "        ChatMessage.from_system(\n",
    "            \"You are a helpful AI assistant. Answer the following question based on the given context information only. If the context is empty or just a '\\n' answer with None, example: 'None'.\"\n",
    "        ),\n",
    "        ChatMessage.from_user(\n",
    "            \"\"\"\n",
    "            Context:\n",
    "            {% for document in documents %}\n",
    "                {{ document.content }}\n",
    "            {% endfor %}\n",
    "    \n",
    "            Question: {{question}}\n",
    "            \"\"\"\n",
    "        )\n",
    "    ]\n",
    "\n",
    "    \n",
    "    # Build the RAG pipeline\n",
    "    rag_pipeline = Pipeline()\n",
    "    \n",
    "    # Add components to the pipeline\n",
    "    rag_pipeline.add_component(instance=InMemoryBM25Retriever(document_store=doc_store), name=\"bm25_retriever\")\n",
    "    rag_pipeline.add_component(instance=query_embedder, name=\"query_embedder\")\n",
    "    rag_pipeline.add_component(instance=InMemoryEmbeddingRetriever(document_store=doc_store), name=\"embedding_retriever\")\n",
    "    rag_pipeline.add_component(instance=DocumentJoiner(sort_by_score=False), name=\"doc_joiner\")\n",
    "    rag_pipeline.add_component(instance=TransformersSimilarityRanker(model=\"intfloat/simlm-msmarco-reranker\", top_k=5), name=\"ranker\")    \n",
    "    rag_pipeline.add_component(instance=ChatPromptBuilder(template=template, required_variables=[\"question\", \"documents\"]), name=\"prompt_builder\", )    \n",
    "    rag_pipeline.add_component(instance=OpenAIChatGenerator(), name=\"llm\")\n",
    "    rag_pipeline.add_component(instance=AnswerBuilder(), name=\"answer_builder\")\n",
    "\n",
    "    # Connect the components\n",
    "    rag_pipeline.connect(\"query_embedder\", \"embedding_retriever.query_embedding\")\n",
    "    rag_pipeline.connect(\"embedding_retriever\", \"doc_joiner.documents\")\n",
    "    rag_pipeline.connect(\"bm25_retriever\", \"doc_joiner.documents\")\n",
    "    rag_pipeline.connect(\"doc_joiner\", \"ranker.documents\")\n",
    "    rag_pipeline.connect(\"ranker\", \"prompt_builder.documents\")\n",
    "    rag_pipeline.connect(\"prompt_builder\", \"llm\")\n",
    "    rag_pipeline.connect(\"llm.replies\", \"answer_builder.replies\")    \n",
    "    rag_pipeline.connect(\"doc_joiner\", \"answer_builder.documents\")\n",
    "\n",
    "    return rag_pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Running the pipeline with breakpoints\n",
    "\n",
    "Now we demonstrate how to set breakpoints in a Haystack pipeline to inspect and debug the pipeline execution at specific points. Breakpoints allow you to pause execution, save the current state of pipeline, and later resume from where you left off.\n",
    "\n",
    "We'll run the pipeline with a breakpoint set at the `query_embedder` component. This will save the pipeline state before executing the `query_embedder` and raise `PipelineBreakpointException` to stop execution.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from haystack.dataclasses.breakpoints import Breakpoint\n",
    "\n",
    "break_point = Breakpoint(component_name=\"query_embedder\", visit_count=0, snapshot_file_path=\"snapshots/\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Indexing documents...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "TransformersSimilarityRanker is considered legacy and will no longer receive updates. It may be deprecated in a future release, with removal following after a deprecation period. Consider using SentenceTransformersSimilarityRanker instead, which provides the same functionality along with additional features.\n"
     ]
    },
    {
     "ename": "BreakpointException",
     "evalue": "Breaking at component query_embedder at visit count 0",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mBreakpointException\u001b[0m                       Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[6], line 15\u001b[0m\n\u001b[1;32m      6\u001b[0m question \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mWhere does Mark live?\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m      7\u001b[0m data \u001b[38;5;241m=\u001b[39m {\n\u001b[1;32m      8\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mquery_embedder\u001b[39m\u001b[38;5;124m\"\u001b[39m: {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtext\u001b[39m\u001b[38;5;124m\"\u001b[39m: question},\n\u001b[1;32m      9\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbm25_retriever\u001b[39m\u001b[38;5;124m\"\u001b[39m: {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mquery\u001b[39m\u001b[38;5;124m\"\u001b[39m: question},\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     12\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124manswer_builder\u001b[39m\u001b[38;5;124m\"\u001b[39m: {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mquery\u001b[39m\u001b[38;5;124m\"\u001b[39m: question},\n\u001b[1;32m     13\u001b[0m }\n\u001b[0;32m---> 15\u001b[0m \u001b[43mpipeline\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbreak_point\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbreak_point\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/haystack-cookbook/.venv/lib/python3.12/site-packages/haystack/core/pipeline/pipeline.py:378\u001b[0m, in \u001b[0;36mPipeline.run\u001b[0;34m(self, data, include_outputs_from, break_point, pipeline_snapshot)\u001b[0m\n\u001b[1;32m    376\u001b[0m         \u001b[38;5;66;03m# trigger the breakpoint if needed\u001b[39;00m\n\u001b[1;32m    377\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m should_trigger_breakpoint:\n\u001b[0;32m--> 378\u001b[0m             \u001b[43m_trigger_break_point\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    379\u001b[0m \u001b[43m                \u001b[49m\u001b[43mpipeline_snapshot\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mnew_pipeline_snapshot\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpipeline_outputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpipeline_outputs\u001b[49m\n\u001b[1;32m    380\u001b[0m \u001b[43m            \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    382\u001b[0m component_outputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run_component(\n\u001b[1;32m    383\u001b[0m     component_name\u001b[38;5;241m=\u001b[39mcomponent_name,\n\u001b[1;32m    384\u001b[0m     component\u001b[38;5;241m=\u001b[39mcomponent,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    387\u001b[0m     parent_span\u001b[38;5;241m=\u001b[39mspan,\n\u001b[1;32m    388\u001b[0m )\n\u001b[1;32m    390\u001b[0m \u001b[38;5;66;03m# Updates global input state with component outputs and returns outputs that should go to\u001b[39;00m\n\u001b[1;32m    391\u001b[0m \u001b[38;5;66;03m# pipeline outputs.\u001b[39;00m\n",
      "File \u001b[0;32m~/haystack-cookbook/.venv/lib/python3.12/site-packages/haystack/core/pipeline/breakpoint.py:299\u001b[0m, in \u001b[0;36m_trigger_break_point\u001b[0;34m(pipeline_snapshot, pipeline_outputs)\u001b[0m\n\u001b[1;32m    297\u001b[0m component_visits \u001b[38;5;241m=\u001b[39m pipeline_snapshot\u001b[38;5;241m.\u001b[39mpipeline_state\u001b[38;5;241m.\u001b[39mcomponent_visits\n\u001b[1;32m    298\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mBreaking at component \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mcomponent_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m at visit count \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mcomponent_visits[component_name]\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m--> 299\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m BreakpointException(\n\u001b[1;32m    300\u001b[0m     message\u001b[38;5;241m=\u001b[39mmsg, component\u001b[38;5;241m=\u001b[39mcomponent_name, inputs\u001b[38;5;241m=\u001b[39mpipeline_snapshot\u001b[38;5;241m.\u001b[39mpipeline_state\u001b[38;5;241m.\u001b[39minputs, results\u001b[38;5;241m=\u001b[39mpipeline_outputs\n\u001b[1;32m    301\u001b[0m )\n",
      "\u001b[0;31mBreakpointException\u001b[0m: Breaking at component query_embedder at visit count 0"
     ]
    }
   ],
   "source": [
    "# Initialize document store and pipeline\n",
    "doc_store = indexing()\n",
    "pipeline = hybrid_retrieval(doc_store)\n",
    "\n",
    "# Define the query\n",
    "question = \"Where does Mark live?\"\n",
    "data = {\n",
    "    \"query_embedder\": {\"text\": question},\n",
    "    \"bm25_retriever\": {\"query\": question},\n",
    "    \"ranker\": {\"query\": question, \"top_k\": 10},\n",
    "    \"prompt_builder\": {\"question\": question},\n",
    "    \"answer_builder\": {\"query\": question},\n",
    "}\n",
    "\n",
    "pipeline.run(data, break_point=break_point)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This run should be interruped with a `BreakpointException: Breaking at component query_embedder visit count 0` - and this will generate a JSON file in the \"snapshots\" directory containing a snapshot of the  before running the component `query_embedder`.\n",
    "\n",
    "The snapshot files, named after the component associated with the breakpoint, can be inspected and edited, and later injected into a pipeline and resume the execution from the point where the breakpoint was triggered. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls snapshots/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Resuming from a break point\n",
    "\n",
    "We can then resume a pipeline from its saved `pipeline_snapshot` by passing it to the Pipeline.run() method. This will run the pipeline to the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mark lives in Berlin.\n",
      "{'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 5, 'prompt_tokens': 124, 'total_tokens': 129, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'all_messages': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='Mark lives in Berlin.')], _name=None, _meta={'model': 'gpt-4o-mini-2024-07-18', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 5, 'prompt_tokens': 124, 'total_tokens': 129, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}})]}\n"
     ]
    }
   ],
   "source": [
    " # Load the pipeline_snapshot and continue execution\n",
    "from haystack.core.pipeline.breakpoint import load_pipeline_snapshot\n",
    "\n",
    "snapshot = load_pipeline_snapshot(\"snapshots/query_embedder_2025_07_26_12_58_26.json\")\n",
    "result = pipeline.run(data={}, pipeline_snapshot=snapshot)\n",
    "    \n",
    "# Print the results\n",
    "print(result['answer_builder']['answers'][0].data)\n",
    "print(result['answer_builder']['answers'][0].meta)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced Use Cases for Pipeline Breakpoints\n",
    "\n",
    "Here are some advanced scenarios where pipeline breakpoints can be particularly valuable:\n",
    "1. Set a breakpoint at the LLM to try results of different prompts and iterate in real time.\n",
    "\n",
    "2. Place a breakpoint after the document retriever to examine and modify retrieved documents.\n",
    "\n",
    "3. Set a breakpoint before a component to inject gold-standard inputs and isolate whether issues stem from input quality or downstream logic."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To demonstrate the use case stated in point 1, we reuse the same query pipeline with a new question. First, we run the pipeline with the prompt that we originally passed to the prompt_builder. Then, we define a breakpoint at the prompt_builder to try an alternative prompt. This allows us to compare the results generated by different prompts without running the whole pipeline again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "TransformersSimilarityRanker is considered legacy and will no longer receive updates. It may be deprecated in a future release, with removal following after a deprecation period. Consider using SentenceTransformersSimilarityRanker instead, which provides the same functionality along with additional features.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Indexing documents...\n"
     ]
    },
    {
     "ename": "BreakpointException",
     "evalue": "Breaking at component prompt_builder at visit count 0",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mBreakpointException\u001b[0m                       Traceback (most recent call last)",
      "Cell \u001b[0;32mIn[11], line 18\u001b[0m\n\u001b[1;32m      7\u001b[0m data \u001b[38;5;241m=\u001b[39m {\n\u001b[1;32m      8\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mquery_embedder\u001b[39m\u001b[38;5;124m\"\u001b[39m: {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtext\u001b[39m\u001b[38;5;124m\"\u001b[39m: question},\n\u001b[1;32m      9\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mbm25_retriever\u001b[39m\u001b[38;5;124m\"\u001b[39m: {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mquery\u001b[39m\u001b[38;5;124m\"\u001b[39m: question},\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m     12\u001b[0m     \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124manswer_builder\u001b[39m\u001b[38;5;124m\"\u001b[39m: {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mquery\u001b[39m\u001b[38;5;124m\"\u001b[39m: question},\n\u001b[1;32m     13\u001b[0m }\n\u001b[1;32m     16\u001b[0m break_point \u001b[38;5;241m=\u001b[39m Breakpoint(component_name\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mprompt_builder\u001b[39m\u001b[38;5;124m\"\u001b[39m, visit_count\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m0\u001b[39m, snapshot_file_path\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msnapshots/\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m---> 18\u001b[0m \u001b[43mpipeline\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mrun\u001b[49m\u001b[43m(\u001b[49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mbreak_point\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbreak_point\u001b[49m\u001b[43m)\u001b[49m\n",
      "File \u001b[0;32m~/haystack-cookbook/.venv/lib/python3.12/site-packages/haystack/core/pipeline/pipeline.py:378\u001b[0m, in \u001b[0;36mPipeline.run\u001b[0;34m(self, data, include_outputs_from, break_point, pipeline_snapshot)\u001b[0m\n\u001b[1;32m    376\u001b[0m         \u001b[38;5;66;03m# trigger the breakpoint if needed\u001b[39;00m\n\u001b[1;32m    377\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m should_trigger_breakpoint:\n\u001b[0;32m--> 378\u001b[0m             \u001b[43m_trigger_break_point\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    379\u001b[0m \u001b[43m                \u001b[49m\u001b[43mpipeline_snapshot\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mnew_pipeline_snapshot\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mpipeline_outputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpipeline_outputs\u001b[49m\n\u001b[1;32m    380\u001b[0m \u001b[43m            \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    382\u001b[0m component_outputs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_run_component(\n\u001b[1;32m    383\u001b[0m     component_name\u001b[38;5;241m=\u001b[39mcomponent_name,\n\u001b[1;32m    384\u001b[0m     component\u001b[38;5;241m=\u001b[39mcomponent,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    387\u001b[0m     parent_span\u001b[38;5;241m=\u001b[39mspan,\n\u001b[1;32m    388\u001b[0m )\n\u001b[1;32m    390\u001b[0m \u001b[38;5;66;03m# Updates global input state with component outputs and returns outputs that should go to\u001b[39;00m\n\u001b[1;32m    391\u001b[0m \u001b[38;5;66;03m# pipeline outputs.\u001b[39;00m\n",
      "File \u001b[0;32m~/haystack-cookbook/.venv/lib/python3.12/site-packages/haystack/core/pipeline/breakpoint.py:299\u001b[0m, in \u001b[0;36m_trigger_break_point\u001b[0;34m(pipeline_snapshot, pipeline_outputs)\u001b[0m\n\u001b[1;32m    297\u001b[0m component_visits \u001b[38;5;241m=\u001b[39m pipeline_snapshot\u001b[38;5;241m.\u001b[39mpipeline_state\u001b[38;5;241m.\u001b[39mcomponent_visits\n\u001b[1;32m    298\u001b[0m msg \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mBreaking at component \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mcomponent_name\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m at visit count \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mcomponent_visits[component_name]\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m\n\u001b[0;32m--> 299\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m BreakpointException(\n\u001b[1;32m    300\u001b[0m     message\u001b[38;5;241m=\u001b[39mmsg, component\u001b[38;5;241m=\u001b[39mcomponent_name, inputs\u001b[38;5;241m=\u001b[39mpipeline_snapshot\u001b[38;5;241m.\u001b[39mpipeline_state\u001b[38;5;241m.\u001b[39minputs, results\u001b[38;5;241m=\u001b[39mpipeline_outputs\n\u001b[1;32m    301\u001b[0m )\n",
      "\u001b[0;31mBreakpointException\u001b[0m: Breaking at component prompt_builder at visit count 0"
     ]
    }
   ],
   "source": [
    "# Initialize document store and pipeline\n",
    "doc_store = indexing()\n",
    "pipeline = hybrid_retrieval(doc_store)\n",
    "\n",
    "# Define the query\n",
    "question = \"What's the temperature difference between the warmest and coldest city?\"\n",
    "data = {\n",
    "    \"query_embedder\": {\"text\": question},\n",
    "    \"bm25_retriever\": {\"query\": question},\n",
    "    \"ranker\": {\"query\": question, \"top_k\": 10},\n",
    "    \"prompt_builder\": {\"question\": question},\n",
    "    \"answer_builder\": {\"query\": question},\n",
    "}\n",
    "\n",
    "\n",
    "break_point = Breakpoint(component_name=\"prompt_builder\", visit_count=0, snapshot_file_path=\"snapshots/\")\n",
    "\n",
    "pipeline.run(data, break_point=break_point)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can manually insert a different template into the `prompt_builder` and inspect the results. To do this, we update the template input within the `prompt_builder` component in the state file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "template = ChatMessage.from_system(\n",
    "    \"\"\"You are a mathematical analysis assistant. Follow these steps:\n",
    "    1. Identify all temperatures mentioned\n",
    "    2. Find the maximum and minimum values\n",
    "    3. Calculate their difference\n",
    "    4. Format response as: 'The temperature difference is X°C (max Y°C in [city] - min Z°C in [city])'\n",
    "    Use ONLY the information provided in the context.\"\"\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we just load the snapshot file and resume the pipeline with the updated snapshot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "snapshots/prompt_builder_2025_07_26_13_01_23.json\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
      "To disable this warning, you can either:\n",
      "\t- Avoid using `tokenizers` before the fork if possible\n",
      "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
     ]
    }
   ],
   "source": [
    "!ls snapshots/prompt_builder*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The temperature in Rome is 30°C and in Berlin is 15°C. The temperature difference between the warmest (Rome) and the coldest (Berlin) city is 30°C - 15°C = 15°C.\n"
     ]
    }
   ],
   "source": [
    "snapshot = load_pipeline_snapshot(\"snapshots/prompt_builder_2025_07_26_13_01_23.json\")\n",
    "result = pipeline.run(data={}, pipeline_snapshot=snapshot)\n",
    "print(result['answer_builder']['answers'][0].data)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}