YDB

YDB 是一款通用的开源分布式 SQL 数据库，结合了高可用性、可扩展性与强一致性和 ACID 事务。它可以同时处理事务型（OLTP）、分析型（OLAP）和流式工作负载。

本笔记本展示了如何使用与 YDB 向量存储相关的功能。

设置

首先，使用 Docker 设置本地 YDB：

! docker run -d -p 2136:2136 --name ydb-langchain -e YDB_USE_IN_MEMORY_PDISKS=true -h localhost ydbplatform/local-ydb:trunk

您需要安装 langchain-ydb 才能使用此集成

! pip install -qU langchain-ydb

凭证

此笔记本无需凭证，只需确保已安装上述软件包。

如果您想获得一流的自动化模型调用跟踪功能，也可以通过取消注释下方的内容来设置您的 LangSmith API 密钥：

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

初始化

Select embeddings model:

pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

from langchain_ydb.vectorstores import YDB, YDBSearchStrategy, YDBSettings

settings = YDBSettings(
    table="ydb_example",
    strategy=YDBSearchStrategy.COSINE_SIMILARITY,
)
vector_store = YDB(embeddings, config=settings)

管理向量存储

创建向量存储后，可以通过添加和删除不同项来与之交互。

向向量存储添加项

准备要处理的文档：

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

API Reference:Document

您可以使用 add_documents 函数将文档添加到您的 vector store 中。

vector_store.add_documents(documents=documents, ids=uuids)

Inserting data...: 100%|██████████| 10/10 [00:00<00:00, 14.67it/s]

['947be6aa-d489-44c5-910e-62e4d58d2ffb',
 '7a62904d-9db3-412b-83b6-f01b34dd7de3',
 'e5a49c64-c985-4ed7-ac58-5ffa31ade699',
 '99cf4104-36ab-4bd5-b0da-e210d260e512',
 '5810bcd0-b46e-443e-a663-e888c9e028d1',
 '190c193d-844e-4dbb-9a4b-b8f5f16cfae6',
 'f8912944-f80a-4178-954e-4595bf59e341',
 '34fc7b09-6000-42c9-95f7-7d49f430b904',
 '0f6b6783-f300-4a4d-bb04-8025c4dfd409',
 '46c37ba9-7cf2-4ac8-9bd1-d84e2cb1155c']

从向量存储中删除项目

您可以使用 delete 函数通过 ID 从向量存储中删除项目。

vector_store.delete(ids=[uuids[-1]])

True

查询向量存储

一旦你的向量存储创建完成并且已添加了相关文档，你很可能希望在执行链或代理的过程中查询它。

直接查询

相似性搜索

简单的相似性搜索可以按如下方式执行：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy", k=2
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

相似性搜索与评分

您也可以执行带评分的搜索：

results = vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=3)
for res, score in results:
    print(f"* [SIM={score:.3f}] {res.page_content} [{res.metadata}]")

* [SIM=0.595] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]
* [SIM=0.212] I had chocolate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
* [SIM=0.118] Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]

筛选

您可以按如下方式使用过滤器进行搜索：

results = vector_store.similarity_search_with_score(
    "What did I eat for breakfast?",
    k=4,
    filter={"source": "tweet"},
)
for res, _ in results:
    print(f"* {res.page_content} [{res.metadata}]")

* I had chocolate chip pancakes and scrambled eggs for breakfast this morning. [{'source': 'tweet'}]
* Wow! That was an amazing movie. I can't wait to see it again. [{'source': 'tweet'}]
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

查询并将其转换为检索器

你也可以将向量存储转换为检索器，以便在你的链中更方便地使用。

下面将介绍如何将向量存储转换为检索器，然后使用一个简单的查询和过滤器来调用该检索器。

retriever = vector_store.as_retriever(
    search_kwargs={"k": 2},
)
results = retriever.invoke(
    "Stealing from the bank is a crime", filter={"source": "news"}
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}]
* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]

用于检索增强生成的使用方法

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下章节：

API 参考

有关所有 YDB 功能和配置的详细文档，请访问 API 参考：https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.ydb.YDB.html

Vector store conceptual guide
Vector store how-to guides

设置​

凭证​

初始化​

管理向量存储​

向向量存储添加项​

从向量存储中删除项目​

查询向量存储​

直接查询​

相似性搜索​

相似性搜索与评分​

筛选​

查询并将其转换为检索器​

用于检索增强生成的使用方法​

API 参考​

Related​

设置

凭证

初始化