LangChain

LangChain 是⼀个⽤于开发由⼤型语⾔模型（LLMs）驱动的应⽤程序的框架。

官⽅⽂档：https://python.langchain.com/docs/introduction/

LangChain 简化了 LLM 应⽤程序⽣命周期的每个阶段：

开发：使⽤ LangChain 的开源构建模块和组件构建您的应⽤程序。利⽤第三⽅集成和模板快速启动。
⽣产部署：使⽤ LangSmith 检查、监控和评估您的链，以便您可以持续优化并⾃信地部署。
部署：使⽤ LangServe 将任何链转换为 API。

具体⽽⾔，该框架包括以下开源库：

langchain-core ：基本抽象和 LangChain 表达语⾔。
langchain-community ：第三⽅集成。
合作伙伴包（例如 langchain-openai ， langchain-anthropic 等）：⼀些集成已进⼀步拆分为仅依赖于 langchain-core 的轻量级包。
langchain ：构成应⽤程序认知架构的链、代理和检索策略。
langgraph：通过将步骤建模为图中的边缘和节点，使⽤ LLMs 构建稳健且有状态的多参与者应⽤程序。
langserve：将 LangChain 链部署为 REST API。
LangSmith：⼀个开发平台，可让您调试、测试、评估和监控 LLM 应⽤程序。

LangChain 的核心组件

模型 I/O 封装
- LLMs：大语言模型
- Chat Models：一般基于 LLMs，但按对话结构重新封装
- PromptTemple：提示词模板
- OutputParser：解析输出
数据连接封装
- Document Loaders：各种格式文件的加载器
- Document Transformers：对文档的常用操作，如：split，filter，translate，extract metadata，etc
- Text Embedding Models：文本向量化表示，用于检索等操作
- Verctorstores：（面向检索的）向量的存储
- Retrievers：向量的检索
对话历史管理
- 对话历史的存储、加载与剪裁
架构封装
- Chain：实现一个功能或者一系列顺序功能组合
- Agent：根据用户输入，自动规划执行步骤，自动选择每步需要的工具，最终完成用户指定的功能
  - Tools：调用外部功能的函数，例如：调 google 搜索、文件 I/O、Linux Shell 等等
  - Toolkits：操作某软件的一组工具集，例如：操作 DB、操作 Gmail 等等
Callbacks

LLMs

将字符串作为输⼊并返回字符串的语⾔模型。这些通常是较旧的模型（较新的模型通常是 Chat Models ，⻅上⽂）。尽管底层模型是字符串输⼊、字符串输出，LangChain 封装器还允许这些模型接受消息作为输⼊。这使它们可以与 ChatModels 互换使⽤。当消息作为输⼊传⼊时，它们将在传递给底层模型之前在内部格式化为字符串。 LangChain 不提供任何 LLMs，⽽是依赖于第三⽅集成。

Messages(消息)

⼀些语⾔模型将消息列表作为输⼊并返回消息。有⼏种不同类型的消息。所有消息都有 role 、 content 和 response_metadata 属性。 role 描述了消息的发出者是谁。

LangChain 为不同的⻆⾊设计了不同的消息类。 content 属性描述了消息的内容。这可以是⼏种不同的内容：

⼀个字符串（⼤多数模型处理这种类型的内容）
⼀个字典列表（⽤于多模态输⼊，其中字典包含有关该输⼊类型和该输⼊位置的信息）

HumanMessage

这代表⽤户发送的消息。

AIMessage

这代表模型发送的消息。除了 content 属性外，这些消息还有： response_metadata response_metadata 属性包含有关响应的其他元数据。这⾥的数据通常针对每个模型提供者具体化。这是存储对数概率和标记使⽤等信息的地⽅。 tool_calls 这些表示语⾔模型调⽤⼯具的决定。它们作为 AIMessage 输出的⼀部分包含在内。可以通过 .tool_calls 属性从中访问。此属性返回⼀个字典列表。每个字典具有以下键：

name ：应调⽤的⼯具的名称。
args ：该⼯具的参数。
id ：该⼯具调⽤的 id。

SystemMessage

这代表系统消息，告诉模型如何⾏为。并⾮每个模型提供者都⽀持这⼀点。

FunctionMessage

这代表函数调⽤的结果。除了 role 和 content ，此消息还有⼀个 name 参数，传达了⽣成此结果所调⽤的函数的名称。

ToolMessage

这代表⼯具调⽤的结果。这与 FunctionMessage 不同，以匹配 OpenAI 的 function 和 tool 消息类型。除了 role 和 content ，此消息还有⼀个 tool_call_id 参数，传达了调⽤⽣成此结果的⼯具的 id。

Prompt templates(提示模板)

提示模板有助于将⽤户输⼊和参数转换为语⾔模型的指令。这可⽤于引导模型的响应，帮助其理解上下⽂并⽣成相关和连贯的基于语⾔的输出。提示模板以字典作为输⼊，其中每个键代表要填充的提示模板中的变量。提示模板输出⼀个 PromptValue。这个 PromptValue 可以传递给 LLM 或 ChatModel，并且还可以转换为字符串或消息列表。存在 PromptValue 的原因是为了⽅便在字符串和消息之间切换。有⼏种不同类型的提示模板

模型 I/O 封装

把不同的模型，统一封装成一个接口，方便更换模型而不用重构代码。

模型 API：LLM vs ChatModel

shell

pip install --upgrade langchain
pip install --upgrade langchain-openai
pip install --upgrade langchain-community

pip install --upgrade langchain
pip install --upgrade langchain-openai
pip install --upgrade langchain-community

OpenAI 模型封装

python

from langchain_openai import ChatOpenAI
# 保证操作系统的环境变量里面配置好了OPENAI_API_KEY，OPENAI_BASE_URL
llm = ChatOpenAI(model="gpt-40-mini")
response = llm.invoke("你是谁")
print(response.content)

from langchain_openai import ChatOpenAI
# 保证操作系统的环境变量里面配置好了OPENAI_API_KEY，OPENAI_BASE_URL
llm = ChatOpenAI(model="gpt-40-mini")
response = llm.invoke("你是谁")
print(response.content)

这样就能简单调用一个模型

多轮对话 Session 封装

python

from langchain.schema import {
    AIMessage, # 等价于OpenAI接口中的assistant role
    HumanMessage, # 等价于OpenAI接口中的user role
    SystemMessage # 等价于OpenAI接口中的system role
}
messages = [
    SystemMessage(content="你是助理"),
    HumanMessage(content="我是老板"),
    AIMessage(content="欢迎"),
    HumanMessage(content="我是谁")
]
ret = llm.invoke(messages)
print(ret.content)

from langchain.schema import {
    AIMessage, # 等价于OpenAI接口中的assistant role
    HumanMessage, # 等价于OpenAI接口中的user role
    SystemMessage # 等价于OpenAI接口中的system role
}
messages = [
    SystemMessage(content="你是助理"),
    HumanMessage(content="我是老板"),
    AIMessage(content="欢迎"),
    HumanMessage(content="我是谁")
]
ret = llm.invoke(messages)
print(ret.content)

模型的输入与输出

Prompt 模板封装

PromptTemplate 可以在模板中自定义变量

python

from langchain.prompts import PromptTemplate
template = PromptTemplate.from_template("给我讲个关于{subject}的笑话")
print(template.format(subject='小明'))

from langchain.prompts import PromptTemplate
template = PromptTemplate.from_template("给我讲个关于{subject}的笑话")
print(template.format(subject='小明'))

ChatPromptTemplate 用模板表示的对话上下文

python

from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI

template = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template("你是{product}的客服助手。你的名字叫{name}"),
        HumanMessagePromptTemplate.from_template("{query}"),
    ]
)

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = template.format_messages(
    product="AI研究院",
    name="大吉",
    query="你是谁"
)

print(prompt)

ret = llm.invoke(prompt)

print(ret.content)

from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI

template = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate.from_template("你是{product}的客服助手。你的名字叫{name}"),
        HumanMessagePromptTemplate.from_template("{query}"),
    ]
)

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = template.format_messages(
    product="AI研究院",
    name="大吉",
    query="你是谁"
)

print(prompt)

ret = llm.invoke(prompt)

print(ret.content)

MessagesPlaceholder 把多轮对话变成模板

python

from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)

human_prompt = "Translate your answer to {language}."
human_message_template = HumanMessagePromptTemplate.from_template(human_prompt)

chat_prompt = ChatPromptTemplate.from_messages(
    # variable_name 是 message placeholder 在模板中的变量名
    # 用于在赋值时使用
    [MessagesPlaceholder("history"), human_message_template]
)

from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)

human_prompt = "Translate your answer to {language}."
human_message_template = HumanMessagePromptTemplate.from_template(human_prompt)

chat_prompt = ChatPromptTemplate.from_messages(
    # variable_name 是 message placeholder 在模板中的变量名
    # 用于在赋值时使用
    [MessagesPlaceholder("history"), human_message_template]
)

python

from langchain_core.messages import AIMessage, HumanMessage

human_message = HumanMessage(content="Who is Elon Musk?")
ai_message = AIMessage(
    content="Elon Musk is a billionaire entrepreneur, inventor, and industrial designer"
)

messages = chat_prompt.format_prompt(
    # 对 "history" 和 "language" 赋值
    history=[human_message, ai_message], language="中文"
)

print(messages.to_messages())

from langchain_core.messages import AIMessage, HumanMessage

human_message = HumanMessage(content="Who is Elon Musk?")
ai_message = AIMessage(
    content="Elon Musk is a billionaire entrepreneur, inventor, and industrial designer"
)

messages = chat_prompt.format_prompt(
    # 对 "history" 和 "language" 赋值
    history=[human_message, ai_message], language="中文"
)

print(messages.to_messages())

python

result = llm.invoke(messages)
print(result.content)

result = llm.invoke(messages)
print(result.content)

从文件加载 Prompt 模板

python

from langchain.prompts import PromptTemplate

template = PromptTemplate.from_file("example_prompt_template.txt")
print("===Template===")
print(template)
print("===Prompt===")
print(template.format(topic='黑色幽默'))

from langchain.prompts import PromptTemplate

template = PromptTemplate.from_file("example_prompt_template.txt")
print("===Template===")
print(template)
print("===Prompt===")
print(template.format(topic='黑色幽默'))

结构化输出

直接输出 Pydantic 对象

python

from pydantic import BaseModel, Field

# 定义你的输出对象
class Date(BaseModel):
    year: int = Field(description="Year")
    month: int = Field(description="Month")
    day: int = Field(description="Day")
    era: str = Field(description="BC or AD")

from pydantic import BaseModel, Field

# 定义你的输出对象
class Date(BaseModel):
    year: int = Field(description="Year")
    month: int = Field(description="Month")
    day: int = Field(description="Day")
    era: str = Field(description="BC or AD")

python

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI

from langchain_core.output_parsers import PydanticOutputParser


model_name = 'gpt-4o-mini'
temperature = 0
llm = ChatOpenAI(model_name=model_name, temperature=temperature)

# 定义结构化输出的模型
structured_llm = llm.with_structured_output(Date)

template = """提取用户输入中的日期。
用户输入:
{query}"""

prompt = PromptTemplate(
    template=template,
)
query = "2024年十二月23日天气晴..."
input_prompt = prompt.format_prompt(query=query)

structured_llm.invoke(input_prompt)

from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain_openai import ChatOpenAI

from langchain_core.output_parsers import PydanticOutputParser


model_name = 'gpt-4o-mini'
temperature = 0
llm = ChatOpenAI(model_name=model_name, temperature=temperature)

# 定义结构化输出的模型
structured_llm = llm.with_structured_output(Date)

template = """提取用户输入中的日期。
用户输入:
{query}"""

prompt = PromptTemplate(
    template=template,
)
query = "2024年十二月23日天气晴..."
input_prompt = prompt.format_prompt(query=query)

structured_llm.invoke(input_prompt)

输出指定格式的 JSON

python

json_schema = {
    "title": "Date",
    "description": "Formated date expression",
    "type": "object",
    "properties": {
        "year": {
            "type": "integer",
            "description": "year, YYYY",
        },
        "month": {
            "type": "integer",
            "description": "month, MM",
        },
        "day": {
            "type": "integer",
            "description": "day, DD",
        },
         "era": {
            "type": "string",
            "description": "BC or AD",
        },
    },
}
structured_llm = llm.with_structured_output(json_schema)

structured_llm.invoke(input_prompt)

json_schema = {
    "title": "Date",
    "description": "Formated date expression",
    "type": "object",
    "properties": {
        "year": {
            "type": "integer",
            "description": "year, YYYY",
        },
        "month": {
            "type": "integer",
            "description": "month, MM",
        },
        "day": {
            "type": "integer",
            "description": "day, DD",
        },
         "era": {
            "type": "string",
            "description": "BC or AD",
        },
    },
}
structured_llm = llm.with_structured_output(json_schema)

structured_llm.invoke(input_prompt)

使用 OutputParser

OutputParser 可以按指定格式解析模型的输出

python

from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser(pydantic_object=Date)

prompt = PromptTemplate(
    template="提取用户输入中的日期。\n用户输入:{query}\n{format_instructions}",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

input_prompt = prompt.format_prompt(query=query)
output = llm.invoke(input_prompt)
print("原始输出:\n"+output.content)

print("\n解析后:")
parser.invoke(output)

from langchain_core.output_parsers import JsonOutputParser

parser = JsonOutputParser(pydantic_object=Date)

prompt = PromptTemplate(
    template="提取用户输入中的日期。\n用户输入:{query}\n{format_instructions}",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

input_prompt = prompt.format_prompt(query=query)
output = llm.invoke(input_prompt)
print("原始输出:\n"+output.content)

print("\n解析后:")
parser.invoke(output)

也可以用 PydanticOutputParser

python

from langchain_core.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=Date)

input_prompt = prompt.format_prompt(query=query)
output = llm.invoke(input_prompt)
print("原始输出:\n"+output.content)

print("\n解析后:")
parser.invoke(output)

from langchain_core.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=Date)

input_prompt = prompt.format_prompt(query=query)
output = llm.invoke(input_prompt)
print("原始输出:\n"+output.content)

print("\n解析后:")
parser.invoke(output)

原始输出:

json

{
  "year": 2024,
  "month": 12,
  "day": 23,
  "era": "AD"
}

{
  "year": 2024,
  "month": 12,
  "day": 23,
  "era": "AD"
}

解析后: Date(year=2024, month=12, day=23, era='AD')

OutputFixingParser 利用大模型做格式自动纠错

python

from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI(model="gpt-4o"))

bad_output = output.content.replace("4","四")
print("PydanticOutputParser:")
try:
    parser.invoke(bad_output)
except Exception as e:
    print(e)

print("OutputFixingParser:")
new_parser.invoke(bad_output)

from langchain.output_parsers import OutputFixingParser

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI(model="gpt-4o"))

bad_output = output.content.replace("4","四")
print("PydanticOutputParser:")
try:
    parser.invoke(bad_output)
except Exception as e:
    print(e)

print("OutputFixingParser:")
new_parser.invoke(bad_output)

PydanticOutputParser: Invalid json output: ```json { "year": 202 四, "month": 12, "day": 23, "era": "AD" }

For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE
OutputFixingParser:

For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE
OutputFixingParser:

Date(year=2024, month=12, day=23, era='AD')

多模态数据传输

这⾥我们演示如何将多模态输⼊直接传递给模型。我们⽬前期望所有输⼊都以与 OpenAI 期望的格式相同的格式传递。对于⽀持多模态输⼊的其他模型提供者，我们在类中添加了逻辑以转换为预期格式。

在这个例⼦中，我们将要求模型描述⼀幅图像。

python

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfpwisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfpwisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

最常⽀持的传⼊图像的⽅式是将其作为字节字符串传⼊。这应该适⽤于⼤多数模型集成。

python

import base64
import httpx
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

message = HumanMessage(
 content=[
    {"type": "text", "text": "⽤中⽂描述这张图⽚中的天⽓"},
    {
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
    },
 ],
)
response = model.invoke([message])
print(response.content)

import base64
import httpx
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")

message = HumanMessage(
 content=[
    {"type": "text", "text": "⽤中⽂描述这张图⽚中的天⽓"},
    {
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
    },
 ],
)
response = model.invoke([message])
print(response.content)

我们可以在“image_url”类型的内容块中直接提供图像 URL。请注意，只有部分模型提供商⽀持此功能。

python

message = HumanMessage(
 content=[
    {"type": "text", "text": "⽤中⽂描述这张图⽚中的天⽓"},
    {"type": "image_url", "image_url": {"url": image_url}},
 ],
)
response = model.invoke([message])
print(response.content)

message = HumanMessage(
 content=[
    {"type": "text", "text": "⽤中⽂描述这张图⽚中的天⽓"},
    {"type": "image_url", "image_url": {"url": image_url}},
 ],
)
response = model.invoke([message])
print(response.content)

我们还可以传⼊多幅图像。

python

message = HumanMessage(
 content=[
    {"type": "text", "text": "这两张图⽚是⼀样的吗？"},
    {"type": "image_url", "image_url": {"url": image_url}},
    {"type": "image_url", "image_url": {"url": image_url}},
 ],
)
response = model.invoke([message])
print(response.content)

message = HumanMessage(
 content=[
    {"type": "text", "text": "这两张图⽚是⼀样的吗？"},
    {"type": "image_url", "image_url": {"url": image_url}},
    {"type": "image_url", "image_url": {"url": image_url}},
 ],
)
response = model.invoke([message])
print(response.content)

LangChain ​

LangChain 的核心组件 ​

LLMs ​

Messages(消息) ​

HumanMessage ​

AIMessage ​

SystemMessage ​

FunctionMessage ​

ToolMessage ​

Prompt templates(提示模板) ​

模型 I/O 封装 ​

模型 API：LLM vs ChatModel ​

OpenAI 模型封装 ​

多轮对话 Session 封装 ​

模型的输入与输出 ​

Prompt 模板封装 ​

从文件加载 Prompt 模板 ​

结构化输出 ​

直接输出 Pydantic 对象 ​

输出指定格式的 JSON ​

使用 OutputParser ​

多模态数据传输 ​

博客

LangChain

LangChain 的核心组件

LLMs

Messages(消息)

HumanMessage

AIMessage

SystemMessage

FunctionMessage

ToolMessage

Prompt templates(提示模板)

模型 I/O 封装

模型 API：LLM vs ChatModel

OpenAI 模型封装

多轮对话 Session 封装

模型的输入与输出

Prompt 模板封装

从文件加载 Prompt 模板

结构化输出

直接输出 Pydantic 对象

输出指定格式的 JSON

使用 OutputParser

多模态数据传输