Deep Lake
Deep Lake 是一个用于构建 AI 应用程序的多模态数据库 Deep Lake 是一个面向 AI 的数据库。 存储 Vectors、Images、Texts、Videos 等。可与 LLMs/LangChain 配合使用。存储、查询、管理版本 和可视化任何 AI 数据。实时流式传输数据到 PyTorch/TensorFlow。
在本 Notebook 中,我们将演示围绕 Deep Lake 向量存储包装的 SelfQueryRetriever。
创建 Deep Lake 向量存储
首先,我们将创建一个 Deep Lake 向量存储并为其填充数据。我们创建了一组包含电影摘要的小型演示文档。
注意: 自查询检索器要求你安装 lark(pip install lark)。我们还需要 deeplake 包。
%pip install --upgrade --quiet lark
# in case if some queries fail consider installing libdeeplake manually
%pip install --upgrade --quiet libdeeplake
我们想使用 OpenAIEmbeddings,所以我们需要获取 OpenAI API 密钥。
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
if "ACTIVELOOP_TOKEN" not in os.environ:
os.environ["ACTIVELOOP_TOKEN"] = getpass.getpass("Activeloop token:")
from langchain_community.vectorstores import DeepLake
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"director": "Andrei Tarkovsky",
"genre": "science fiction",
"rating": 9.9,
},
),
]
username_or_org = "<USERNAME_OR_ORG>"
vectorstore = DeepLake.from_documents(
docs,
embeddings,
dataset_path=f"hub://{username_or_org}/self_queery",
overwrite=True,
)
Your Deep Lake dataset has been successfully created!
``````output
/
``````output
Dataset(path='hub://adilkhan/self_queery', tensors=['embedding', 'id', 'metadata', 'text'])
tensor htype shape dtype compression
------- ------- ------- ------- -------
embedding embedding (6, 1536) float32 None
id text (6, 1) str None
metadata json (6, 1) str None
text text (6, 1) str None