Skip to main content
Open In ColabOpen on GitHub

ChatGoogleGenerativeAI

直接通过 Gemini API 访问 Google 的生成式 AI 模型,包括 Gemini 系列,或者使用 Google AI Studio 快速进行实验。langchain-google-genai 包提供了这些模型的 LangChain 集成。对于个人开发者来说,这通常是最佳的起点。

有关最新模型、其功能、上下文窗口等方面的信息,请访问 Google AI 文档。所有示例均使用 gemini-2.0-flash 模型。Gemini 2.5 Pro 和 2.5 Flash 可通过 gemini-2.5-pro-preview-03-25gemini-2.5-flash-preview-04-17 获得。所有模型 ID 均可在 Gemini API 文档 中找到。

集成详情

本地可序列化JS 支持包下载量最新包版本
ChatGoogleGenerativeAIlangchain-google-genaibetaPyPI - DownloadsPyPI - Version

模型功能

工具调用结构化输出JSON 模式图像输入音频输入视频输入Token 级流式传输原生异步Token 使用量跟踪Logprobs

设置

要访问 Google AI 模型,您需要创建一个 Google 账户,获取 Google AI API 密钥,并安装 langchain-google-genai 集成包。

1. 安装:

%pip install -U langchain-google-genai

2. 凭据:

前往 https://ai.google.dev/gemini-api/docs/api-key(或通过 Google AI Studio)来生成 Google AI API 密钥。

对话模型

使用 ChatGoogleGenerativeAI 类与 Google 的对话模型进行交互。有关详细信息,请参阅 API 参考

import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")

要启用模型调用的自动跟踪,请设置您的 LangSmith API 密钥:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

实例化

现在我们可以实例化我们的模型对象并生成聊天补全:

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# other params...
)

调用

messages = [
(
"system",
"You are a helpful assistant that translates English to French. Translate the user sentence.",
),
("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-3b28d4b8-8a62-4e6c-ad4e-b53e6e825749-0', usage_metadata={'input_tokens': 20, 'output_tokens': 7, 'total_tokens': 27, 'input_token_details': {'cache_read': 0}})
print(ai_msg.content)
J'adore la programmation.

链式调用

我们可以像这样将我们的模型与提示模板进行链式调用

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)

chain = prompt | llm
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)
API Reference:ChatPromptTemplate
AIMessage(content='Ich liebe Programmieren.', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-e5561c6b-2beb-4411-9210-4796b576a7cd-0', usage_metadata={'input_tokens': 15, 'output_tokens': 7, 'total_tokens': 22, 'input_token_details': {'cache_read': 0}})

多模态使用

Gemini 模型可以接受多模态输入(文本、图像、音频、视频),并且某些模型还可以生成多模态输出。

图像输入

使用包含列表内容格式的 HumanMessage,与文本一起提供图像输入。gemini-2.0-flash 模型可以处理图像。

import base64

from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Example using a public URL (remains the same)
message_url = HumanMessage(
content=[
{
"type": "text",
"text": "Describe the image at the URL.",
},
{"type": "image_url", "image_url": "https://picsum.photos/seed/picsum/200/300"},
]
)
result_url = llm.invoke([message_url])
print(f"Response for URL image: {result_url.content}")

# Example using a local image file encoded in base64
image_file_path = "/Users/philschmid/projects/google-gemini/langchain/docs/static/img/agents_vs_chains.png"

with open(image_file_path, "rb") as image_file:
encoded_image = base64.b64encode(image_file.read()).decode("utf-8")

message_local = HumanMessage(
content=[
{"type": "text", "text": "Describe the local image."},
{"type": "image_url", "image_url": f"data:image/png;base64,{encoded_image}"},
]
)
result_local = llm.invoke([message_local])
print(f"Response for local image: {result_local.content}")

其他支持的 image_url 格式:

  • Google Cloud Storage URI(gs://...)。请确保服务账号具有访问权限。
  • PIL Image 对象(库会处理编码)。

音频输入

提供音频文件输入以及文本。请使用类似 gemini-2.0-flash 的模型。

import base64

from langchain_core.messages import HumanMessage

# Ensure you have an audio file named 'example_audio.mp3' or provide the correct path.
audio_file_path = "example_audio.mp3"
audio_mime_type = "audio/mpeg"


with open(audio_file_path, "rb") as audio_file:
encoded_audio = base64.b64encode(audio_file.read()).decode("utf-8")

message = HumanMessage(
content=[
{"type": "text", "text": "Transcribe the audio."},
{
"type": "media",
"data": encoded_audio, # Use base64 string directly
"mime_type": audio_mime_type,
},
]
)
response = llm.invoke([message]) # Uncomment to run
print(f"Response for audio: {response.content}")
API Reference:HumanMessage

视频输入

提供视频文件输入以及文本。使用像 gemini-2.0-flash 这样的模型。

import base64

from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

# Ensure you have a video file named 'example_video.mp4' or provide the correct path.
video_file_path = "example_video.mp4"
video_mime_type = "video/mp4"


with open(video_file_path, "rb") as video_file:
encoded_video = base64.b64encode(video_file.read()).decode("utf-8")

message = HumanMessage(
content=[
{"type": "text", "text": "Describe the first few frames of the video."},
{
"type": "media",
"data": encoded_video, # Use base64 string directly
"mime_type": video_mime_type,
},
]
)
response = llm.invoke([message]) # Uncomment to run
print(f"Response for video: {response.content}")

图片生成(多模态输出)

gemini-2.0-flash 模型可以内联生成文本和图片(图片生成尚处于实验阶段)。您需要指定所需的 response_modalities

import base64

from IPython.display import Image, display
from langchain_core.messages import AIMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="models/gemini-2.0-flash-preview-image-generation")

message = {
"role": "user",
"content": "Generate a photorealistic image of a cuddly cat wearing a hat.",
}

response = llm.invoke(
[message],
generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)


def _get_image_base64(response: AIMessage) -> None:
image_block = next(
block
for block in response.content
if isinstance(block, dict) and block.get("image_url")
)
return image_block["image_url"].get("url").split(",")[-1]


image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

图片与文本转图片

您可以在多轮对话中迭代处理图片,如下所示:

next_message = {
"role": "user",
"content": "Can you take the same image and make the cat black?",
}

response = llm.invoke(
[message, response, next_message],
generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)

image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

您也可以通过在 data URI scheme 中编码 base64 数据,将输入图像和查询表示在单个消息中:

message = {
"role": "user",
"content": [
{
"type": "text",
"text": "Can you make this cat orange?",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{image_base64}"},
},
],
}

response = llm.invoke(
[message],
generation_config=dict(response_modalities=["TEXT", "IMAGE"]),
)

image_base64 = _get_image_base64(response)
display(Image(data=base64.b64decode(image_base64), width=300))

您还可以使用 LangGraph 来为您管理对话历史,就像在本教程中那样:LangGraph 对话历史管理教程

工具调用

你可以为模型配备工具来调用。

from langchain_core.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the tool
@tool(description="Get the current weather in a given location")
def get_weather(location: str) -> str:
return "It's sunny."


# Initialize the model and bind the tool
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
llm_with_tools = llm.bind_tools([get_weather])

# Invoke the model with a query that should trigger the tool
query = "What's the weather in San Francisco?"
ai_msg = llm_with_tools.invoke(query)

# Check the tool calls in the response
print(ai_msg.tool_calls)

# Example tool call message would be needed here if you were actually running the tool
from langchain_core.messages import ToolMessage

tool_message = ToolMessage(
content=get_weather(*ai_msg.tool_calls[0]["args"]),
tool_call_id=ai_msg.tool_calls[0]["id"],
)
llm_with_tools.invoke([ai_msg, tool_message]) # Example of passing tool result back
[{'name': 'get_weather', 'args': {'location': 'San Francisco'}, 'id': 'a6248087-74c5-4b7c-9250-f335e642927c', 'type': 'tool_call'}]
AIMessage(content="OK. It's sunny in San Francisco.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash', 'safety_ratings': []}, id='run-ac5bb52c-e244-4c72-9fbc-fb2a9cd7a72e-0', usage_metadata={'input_tokens': 29, 'output_tokens': 11, 'total_tokens': 40, 'input_token_details': {'cache_read': 0}})

结构化输出

使用 Pydantic 模型强制模型以特定结构进行响应。

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_google_genai import ChatGoogleGenerativeAI


# Define the desired structure
class Person(BaseModel):
"""Information about a person."""

name: str = Field(..., description="The person's name")
height_m: float = Field(..., description="The person's height in meters")


# Initialize the model
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
structured_llm = llm.with_structured_output(Person)

# Invoke the model with a query asking for structured information
result = structured_llm.invoke(
"Who was the 16th president of the USA, and how tall was he in meters?"
)
print(result)
name='Abraham Lincoln' height_m=1.93

Token 使用跟踪

从响应元数据中访问 token 使用信息。

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

result = llm.invoke("Explain the concept of prompt engineering in one sentence.")

print(result.content)
print("\nUsage Metadata:")
print(result.usage_metadata)
Prompt engineering is the art and science of crafting effective text prompts to elicit desired and accurate responses from large language models.

Usage Metadata:
{'input_tokens': 10, 'output_tokens': 24, 'total_tokens': 34, 'input_token_details': {'cache_read': 0}}

内置工具

Google Gemini 支持多种内置工具(Google 搜索代码执行),可以通过常规方式与模型绑定。

from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
"When is the next total solar eclipse in US?",
tools=[GenAITool(google_search={})],
)

print(resp.content)
The next total solar eclipse visible in the United States will occur on August 23, 2044. However, the path of totality will only pass through Montana, North Dakota, and South Dakota.

For a total solar eclipse that crosses a significant portion of the continental U.S., you'll have to wait until August 12, 2045. This eclipse will start in California and end in Florida.
from google.ai.generativelanguage_v1beta.types import Tool as GenAITool

resp = llm.invoke(
"What is 2*2, use python",
tools=[GenAITool(code_execution={})],
)

for c in resp.content:
if isinstance(c, dict):
if c["type"] == "code_execution_result":
print(f"Code execution result: {c['code_execution_result']}")
elif c["type"] == "executable_code":
print(f"Executable code: {c['executable_code']}")
else:
print(c)
Executable code: print(2*2)

Code execution result: 4

2*2 is 4.
``````output
/Users/philschmid/projects/google-gemini/langchain/.venv/lib/python3.9/site-packages/langchain_google_genai/chat_models.py:580: UserWarning:
⚠️ Warning: Output may vary each run.
- 'executable_code': Always present.
- 'execution_result' & 'image_url': May be absent for some queries.

Validate before using in production.

warnings.warn(

原生异步

使用异步方法进行非阻塞调用。

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")


async def run_async_calls():
# Async invoke
result_ainvoke = await llm.ainvoke("Why is the sky blue?")
print("Async Invoke Result:", result_ainvoke.content[:50] + "...")

# Async stream
print("\nAsync Stream Result:")
async for chunk in llm.astream(
"Write a short poem about asynchronous programming."
):
print(chunk.content, end="", flush=True)
print("\n")

# Async batch
results_abatch = await llm.abatch(["What is 1+1?", "What is 2+2?"])
print("Async Batch Results:", [res.content for res in results_abatch])


await run_async_calls()
Async Invoke Result: The sky is blue due to a phenomenon called **Rayle...

Async Stream Result:
The thread is free, it does not wait,
For answers slow, or tasks of fate.
A promise made, a future bright,
It moves ahead, with all its might.

A callback waits, a signal sent,
When data's read, or job is spent.
Non-blocking code, a graceful dance,
Responsive apps, a fleeting glance.

Async Batch Results: ['1 + 1 = 2', '2 + 2 = 4']

安全设置

Gemini 模型具有可以被覆盖的默认安全设置。如果您从模型中收到大量“安全警告”,可以尝试调整模型的 safety_settings 属性。例如,要关闭危险内容的安全性屏蔽,可以按如下方式构建您的 LLM:

from langchain_google_genai import (
ChatGoogleGenerativeAI,
HarmBlockThreshold,
HarmCategory,
)

llm = ChatGoogleGenerativeAI(
model="gemini-1.5-pro",
safety_settings={
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
},
)

要了解可用的类别和阈值,请参阅 Google 的安全设置类型

API 参考

如需了解 ChatGoogleGenerativeAI 所有功能和配置的详细文档,请访问 API 参考:https://python.langchain.com/api_reference/google_genai/chat_models/langchain_google_genai.chat_models.ChatGoogleGenerativeAI.html