如何让你的 RAG 应用返回引文来源

在问答应用中，向用户展示用于生成答案的来源通常很重要。最简单的方法是让链在每次生成时返回检索到的 Documents。

我们将基于在 Lilian Weng 的 LLM 驱动的自主代理人博客文章中构建的问答应用，该文章见 RAG 教程。

我们将介绍两种方法：

使用 RAG 教程第一部分中介绍的基础 RAG 链；
使用教程第二部分中介绍的对话式 RAG 链。

我们还将展示如何将来源构建到模型响应中，以便模型可以报告它在生成答案时使用了哪些特定的来源。

设置

依赖项

我们将使用以下包：

%pip install --upgrade --quiet langchain langchain-community langchainhub beautifulsoup4

LangSmith

您使用 LangChain 构建的许多应用程序将包含多个步骤和多次 LLM 调用。随着应用程序的日益复杂，能够检查链或代理内部的具体运行情况变得至关重要。完成此目的的最佳方法是使用 LangSmith。

请注意，LangSmith 不是必需的，但它很有帮助。如果您确实想使用 LangSmith，在上述链接注册后，请确保设置您的环境变量以开始记录跟踪：

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

组件

我们需要从 LangChain 的集成套件中选择三个组件。

一个聊天模型：

Select chat model:

pip install -qU "langchain[google-genai]"

import getpass
import os

if not os.environ.get("GOOGLE_API_KEY"):
  os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gemini-2.0-flash", model_provider="google_genai")

一个嵌入模型：

Select embeddings model:

pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

以及一个向量存储：

Select vector store:

pip install -qU langchain-core

from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

RAG 应用

让我们重新构建我们在 Lilian Weng 的博客文章 LLM 驱动的自主代理中构建的、包含来源的问答应用，具体请参考 RAG 教程。

首先，我们索引我们的文档：

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

API Reference:hub | WebBaseLoader | Document | RecursiveCharacterTextSplitter

# Index chunks
_ = vector_store.add_documents(documents=all_splits)

接下来，我们构建应用程序：

from langchain import hub
from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt")


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

API Reference:hub | Document | StateGraph

from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))

因为我们在应用程序的状态中跟踪检索到的上下文，所以可以在调用该应用程序后访问它：

result = graph.invoke({"question": "What is Task Decomposition?"})

print(f'Context: {result["context"]}\n\n')
print(f'Answer: {result["answer"]}')

Context: [Document(id='c8471b37-07d8-4d51-856e-4b2c22bca88d', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'), Document(id='acb7eb6f-f252-4353-aec2-f459135354ba', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'), Document(id='4fae6668-7fec-4237-9b2d-78132f4f3f3f', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.'), Document(id='3c79dd86-595e-42e8-b64d-404780f9e2d9', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content="(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.")]

Answer: Task Decomposition is the process of breaking down a complex task into smaller, manageable steps to facilitate execution. This can be achieved through techniques like Chain of Thought, which encourages step-by-step reasoning, or Tree of Thoughts, which explores multiple reasoning paths for each step. It can be implemented using simple prompts, specific instructions, or human input to effectively tackle the original task.

此处，"context" 包含 LLM 在生成 "answer" 中的响应时使用的来源。

在模型响应中组织来源

到目前为止，我们只是将检索步骤返回的文档直接传递给最终响应。但这可能无法说明模型在生成答案时依赖了什么信息子集。下面，我们将展示如何将来源组织到模型响应中，让模型能够报告它依赖了哪些具体上下文来生成答案。

扩展上述 LangGraph 实现非常简单。下面我们做了一个简单的更改：我们使用模型的工具调用功能来生成结构化输出，该输出包括答案和来源列表。响应的模式在下面的 AnswerWithSources TypedDict 中表示。

from typing import List

from typing_extensions import Annotated, TypedDict


# Desired schema for response
class AnswerWithSources(TypedDict):
    """An answer to the question, with sources."""

    answer: str
    sources: Annotated[
        List[str],
        ...,
        "List of sources (author + year) used to answer the question",
    ]


class State(TypedDict):
    question: str
    context: List[Document]
    answer: AnswerWithSources


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    structured_llm = llm.with_structured_output(AnswerWithSources)
    response = structured_llm.invoke(messages)
    return {"answer": response}


graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

import json

result = graph.invoke({"question": "What is Chain of Thought?"})
print(json.dumps(result["answer"], indent=2))

{
  "answer": "Chain of Thought (CoT) is a prompting technique that enhances model performance by instructing it to think step by step, allowing the decomposition of complex tasks into smaller, manageable steps. This method not only aids in task execution but also provides insights into the model's reasoning process. CoT has become a standard approach in improving how language models handle intricate problem-solving tasks.",
  "sources": [
    "Wei et al. 2022"
  ]
}

tip

查看 LangSmith trace。

对话式 RAG

第二部分是 RAG 系列教程中的一个实现，它采用了不同的架构，其中 RAG 流程中的各个步骤通过一系列消息对象来表示。这利用了聊天模型额外的工具调用功能，并且更能自然地适应“来回”的对话用户体验。

在该教程（以及下文）中，我们将检索到的文档作为制品传递给工具消息。这样可以轻松提取出检索到的文档。在下面，我们为了方便，将它们添加为状态中的一个额外键。

请注意，我们将工具的响应格式定义为 "content_and_artifact"：

from langchain_core.tools import tool


@tool(response_format="content_and_artifact")
def retrieve(query: str):
    """Retrieve information related to a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

API Reference:tool

现在，我们可以如同在 RAG 系列教程的第二部分中一样，构建和编译完全相同的应用程序，只需进行两处更改：

我们为 state 添加了一个 context 键，用于存储检索到的文档；
在 generate 步骤中，我们提取检索到的文档并将其填充到 state 中。

这些更改在下方有突出显示。

from langchain_core.messages import SystemMessage
from langgraph.graph import END, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition


class State(MessagesState):
    context: List[Document]


# Step 1: Generate an AIMessage that may include a tool-call to be sent.
def query_or_respond(state: State):
    """Generate tool call for retrieval or respond."""
    llm_with_tools = llm.bind_tools([retrieve])
    response = llm_with_tools.invoke(state["messages"])
    # MessagesState appends messages to state instead of overwriting
    return {"messages": [response]}


# Step 2: Execute the retrieval.
tools = ToolNode([retrieve])


# Step 3: Generate a response using the retrieved content.
def generate(state: MessagesState):
    """Generate answer."""
    # Get generated ToolMessages
    recent_tool_messages = []
    for message in reversed(state["messages"]):
        if message.type == "tool":
            recent_tool_messages.append(message)
        else:
            break
    tool_messages = recent_tool_messages[::-1]

    # Format into prompt
    docs_content = "\n\n".join(doc.content for doc in tool_messages)
    system_message_content = (
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer "
        "the question. If you don't know the answer, say that you "
        "don't know. Use three sentences maximum and keep the "
        "answer concise."
        "\n\n"
        f"{docs_content}"
    )
    conversation_messages = [
        message
        for message in state["messages"]
        if message.type in ("human", "system")
        or (message.type == "ai" and not message.tool_calls)
    ]
    prompt = [SystemMessage(system_message_content)] + conversation_messages

    # Run
    response = llm.invoke(prompt)
    context = []
    for tool_message in tool_messages:
        context.extend(tool_message.artifact)
    return {"messages": [response], "context": context}

API Reference:SystemMessage | StateGraph | ToolNode | tools_condition

我们可以像之前一样编译应用程序：

graph_builder = StateGraph(MessagesState)

graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
    "query_or_respond",
    tools_condition,
    {END: END, "tools": "tools"},
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()

display(Image(graph.get_graph().draw_mermaid_png()))

调用我们的应用程序，我们看到检索到的 Document 对象可以从应用程序状态中访问。

input_message = "What is Task Decomposition?"

for step in graph.stream(
    {"messages": [{"role": "user", "content": input_message}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

================================[1m Human Message [0m=================================

What is Task Decomposition?
==================================[1m Ai Message [0m==================================
Tool Calls:
  retrieve (call_oA0XZ5hF70X0oW4ccNUFCFxX)
 Call ID: call_oA0XZ5hF70X0oW4ccNUFCFxX
  Args:
    query: Task Decomposition
=================================[1m Tool Message [0m=================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
==================================[1m Ai Message [0m==================================

Task Decomposition is the process of breaking down a complicated task into smaller, manageable steps. It often utilizes techniques like Chain of Thought (CoT) prompting, which encourages models to think step by step, enhancing performance on complex tasks. This approach helps clarify the model's reasoning and makes it easier to tackle difficult problems.

step["context"]

[Document(id='c8471b37-07d8-4d51-856e-4b2c22bca88d', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'),
 Document(id='acb7eb6f-f252-4353-aec2-f459135354ba', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.')]

tip

查看 LangSmith 追踪。

设置​

依赖项​

LangSmith​

组件​

RAG 应用​

在模型响应中组织来源​

对话式 RAG​

设置

依赖项

LangSmith

组件

RAG 应用

在模型响应中组织来源

对话式 RAG