将文本分类为标签

标记是指为文档打上诸如以下之类的类别标签：

情绪
语言
风格（例如，正式、非正式）
涵盖的主题
政治倾向

Image description

概述

标记包含几个组件：

function：与提取类似，标记也使用函数来指定模型应如何标记文档
schema：定义我们希望如何标记文档

入门

让我们通过一个非常简单的示例，了解如何在 LangChain 中使用 OpenAI 工具调用进行标记。我们将使用 OpenAI 模型支持的 with_structured_output 方法。

pip install --upgrade --quiet langchain-core

我们需要加载一个 chat model：

Select chat model:

pip install -qU "langchain[google-genai]"

import getpass
import os

if not os.environ.get("GOOGLE_API_KEY"):
  os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gemini-2.0-flash", model_provider="google_genai")

让我们用 Pydantic 模型指定具有几个属性及其在 schema 中的预期类型。

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)


class Classification(BaseModel):
    sentiment: str = Field(description="The sentiment of the text")
    aggressiveness: int = Field(
        description="How aggressive the text is on a scale from 1 to 10"
    )
    language: str = Field(description="The language the text is written in")


# Structured LLM
structured_llm = llm.with_structured_output(Classification)

API Reference:ChatPromptTemplate | ChatOpenAI

inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

response

Classification(sentiment='positive', aggressiveness=1, language='Spanish')

如果我们想要字典输出，可以直接调用 .model_dump()

inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
prompt = tagging_prompt.invoke({"input": inp})
response = structured_llm.invoke(prompt)

response.model_dump()

{'sentiment': 'enojado', 'aggressiveness': 8, 'language': 'es'}

正如我们在示例中看到的，它正确地解释了我们的意图。

结果各不相同，因此我们可能会得到不同语言的情绪（例如，“积极”、“enojado”等）。

我们将在下一节中了解如何控制这些结果。

更精细化的控制

精心的模式定义让我们能够更精细化地控制模型的输出。

具体来说，我们可以定义：

每个属性的可能值
描述，以确保模型理解该属性
模型必须返回的必需属性

让我们重新声明我们的 Pydantic 模型，以 enum 的方式控制前面提到的各个方面：

class Classification(BaseModel):
    sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
    aggressiveness: int = Field(
        ...,
        description="describes how aggressive the statement is, the higher the number the more aggressive",
        enum=[1, 2, 3, 4, 5],
    )
    language: str = Field(
        ..., enum=["spanish", "english", "french", "german", "italian"]
    )

tagging_prompt = ChatPromptTemplate.from_template(
    """
Extract the desired information from the following passage.

Only extract the properties mentioned in the 'Classification' function.

Passage:
{input}
"""
)

llm = ChatOpenAI(temperature=0, model="gpt-4o-mini").with_structured_output(
    Classification
)

现在答案将以我们期望的方式受到限制！

inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)

Classification(sentiment='positive', aggressiveness=1, language='Spanish')

inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)

Classification(sentiment='enojado', aggressiveness=8, language='es')

inp = "Weather is ok here, I can go outside without much more than a coat"
prompt = tagging_prompt.invoke({"input": inp})
llm.invoke(prompt)

Classification(sentiment='neutral', aggressiveness=1, language='English')

LangSmith 的追踪让我们得以窥探其内部运作：

Image description

深入了解

您可以使用 metadata tagger 文档转换器从 LangChain Document 中提取元数据。
这涵盖了与 tagging chain 基本相同的功能，只是应用于 LangChain Document。

概述​

入门​

更精细化的控制​

深入了解​

概述

入门

更精细化的控制

深入了解