GreenNodeEmbeddings
GreenNode是一家全球性人工智能解决方案提供商,也是 NVIDIA 优选合作伙伴,为美国、中东和北非以及亚太地区的各类企业提供从基础设施到应用的端到端人工智能能力。GreenNode 在世界一流的基础设施(LEED Gold, TIA‑942, Uptime Tier III)上运行,通过一套全面的 AI 服务赋能企业、初创公司和研究人员。
本 Notebook 提供了开始使用 GreenNodeEmbeddings 的指南。它通过生成高质量的文本向量表示,使您能够使用各种内置连接器或您自己的自定义数据源执行语义文档搜索。
对接详情
| Provider | Package |
|---|---|
| GreenNode | langchain-greennode |
设置
要访问 GreenNode 嵌入模型,您需要创建一个 GreenNode 账户,获取 API 密钥,并安装 langchain-greennode 集成包。
凭证
GreenNode 需要 API 密钥进行身份验证,该密钥可以在初始化期间作为 api_key 参数提供,也可以设置为环境变量 GREENNODE_API_KEY。您可以通过在 GreenNode Serverless AI 上注册账户来获取 API 密钥。
import getpass
import os
if not os.getenv("GREENNODE_API_KEY"):
os.environ["GREENNODE_API_KEY"] = getpass.getpass("Enter your GreenNode API key: ")
如果你想获取模型调用的自动化追踪,也可以通过取消下面代码的注释来设置你的 LangSmith API 密钥:
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
安装
LangChain GreenNode 集成位于 langchain-greennode 包中:
%pip install -qU langchain-greennode
Note: you may need to restart the kernel to use updated packages.
实例化
GreenNodeEmbeddings 类可以通过可选的 API 密钥和模型名称参数进行实例化:
from langchain_greennode import GreenNodeEmbeddings
# Initialize the embeddings model
embeddings = GreenNodeEmbeddings(
# api_key="YOUR_API_KEY", # You can pass the API key directly
model="BAAI/bge-m3" # The default embedding model
)
索引和检索
嵌入模型在检索增强生成 (RAG) 工作流程中发挥着关键作用,它们能够对内容进行索引和高效检索。
下面,我们将展示如何使用我们上面初始化的 embeddings 对象来索引和检索数据。在此示例中,我们将使用 InMemoryVectorStore 索引和检索一个示例文档。
# Create a vector store with a sample text
from langchain_core.vectorstores import InMemoryVectorStore
text = "LangChain is the framework for building context-aware reasoning applications"
vectorstore = InMemoryVectorStore.from_texts(
[text],
embedding=embeddings,
)
# Use the vectorstore as a retriever
retriever = vectorstore.as_retriever()
# Retrieve the most similar text
retrieved_documents = retriever.invoke("What is LangChain?")
# show the retrieved document's content
retrieved_documents[0].page_content
'LangChain is the framework for building context-aware reasoning applications'
直接使用
GreenNodeEmbeddings 类可以独立使用,无需向量存储即可生成文本嵌入。这对于诸如相似度评分、聚类或自定义处理管道等任务非常有用。
嵌入单个文本
您可以使用 embed_query 嵌入单个文本或文档:
single_vector = embeddings.embed_query(text)
print(str(single_vector)[:100]) # Show the first 100 characters of the vector
[-0.01104736328125, -0.0281982421875, 0.0035858154296875, -0.0311279296875, -0.0106201171875, -0.039
嵌入多个文本
您可以使用 embed_documents 嵌入多个文本:
text2 = (
"LangGraph is a library for building stateful, multi-actor applications with LLMs"
)
two_vectors = embeddings.embed_documents([text, text2])
for vector in two_vectors:
print(str(vector)[:100]) # Show the first 100 characters of the vector
[-0.01104736328125, -0.0281982421875, 0.0035858154296875, -0.0311279296875, -0.0106201171875, -0.039
[-0.07177734375, -0.00017452239990234375, -0.002044677734375, -0.0299072265625, -0.0184326171875, -0
异步支持
GreenNodeEmbeddings 支持异步操作:
import asyncio
async def generate_embeddings_async():
# Embed a single query
query_result = await embeddings.aembed_query("What is the capital of France?")
print(f"Async query embedding dimension: {len(query_result)}")
# Embed multiple documents
docs = [
"Paris is the capital of France",
"Berlin is the capital of Germany",
"Rome is the capital of Italy",
]
docs_result = await embeddings.aembed_documents(docs)
print(f"Async document embeddings count: {len(docs_result)}")
await generate_embeddings_async()
Async query embedding dimension: 1024
Async document embeddings count: 3
文档相似度示例
import numpy as np
from scipy.spatial.distance import cosine
# Create some documents
documents = [
"Machine learning algorithms build mathematical models based on sample data",
"Deep learning uses neural networks with many layers",
"Climate change is a major global environmental challenge",
"Neural networks are inspired by the human brain's structure",
]
# Embed the documents
embeddings_list = embeddings.embed_documents(documents)
# Function to calculate similarity
def calculate_similarity(embedding1, embedding2):
return 1 - cosine(embedding1, embedding2)
# Print similarity matrix
print("Document Similarity Matrix:")
for i, emb_i in enumerate(embeddings_list):
similarities = []
for j, emb_j in enumerate(embeddings_list):
similarity = calculate_similarity(emb_i, emb_j)
similarities.append(f"{similarity:.4f}")
print(f"Document {i + 1}: {similarities}")
Document Similarity Matrix:
Document 1: ['1.0000', '0.6005', '0.3542', '0.5788']
Document 2: ['0.6005', '1.0000', '0.4154', '0.6170']
Document 3: ['0.3542', '0.4154', '1.0000', '0.3528']
Document 4: ['0.5788', '0.6170', '0.3528', '1.0000']
API 参考
有关 GreenNode 无服务器 AI API 的更多详细信息,请访问 GreenNode 无服务器 AI 文档。