ADK 第三篇 Agents (LlmAgent)

Agents

在智能体开发套件（ADK）中，智能体（Agent）是一个独立的执行单元，旨在自主行动以实现特定目标。智能体能够执行任务、与用户交互、使用外部工具，并与其他智能体协同工作。

在ADK中，所有智能体的基础都是BaseAgent类，它充当着核心蓝图的作用。开发者通常通过以下三种主要方式扩展BaseAgent，以满足不同需求——从智能推理到结构化流程控制，从而创建出功能完备的智能体。

核心智能体类型

ADK 提供多种核心智能体类型，用于构建复杂应用场景：

大语言模型智能体（LlmAgent/Agent）：以大型语言模型（LLM）为核心引擎，具备自然语言理解、逻辑推理、任务规划、内容生成等能力，并能动态决策执行路径与工具调用，特别适合需要灵活语言处理的任务。

工作流智能体（SequentialAgent/ParallelAgent/LoopAgent）：通过预定义模式（顺序/并行/循环）精确控制其他智能体的执行流程，其流程调度机制不依赖LLM，适用于需要确定性执行的结构化流程。

自定义智能体 (Custom Agents)：通过直接扩展BaseAgent实现，可开发具有独特业务逻辑、定制化控制流或特殊集成的智能体，满足高度定制化需求。

选择适合的智能体类型

下表提供了高层次对比，帮助区分不同智能体类型。随着您在后续章节深入了解每种类型，这些差异将更加清晰。

功能对比	LLM Agent (`LlmAgent`)	Workflow Agent	Custom Agent (`BaseAgent` subclass)
核心功能	推理/生成/工具调用	控制智能体执行流程	实现独特逻辑与集成
驱动引擎	大型语言模型(LLM)	预定义逻辑(顺序/并行/循环)	自定义Python代码
确定性	非确定性(灵活响应)	确定性(可预测执行)	可自定义(取决于实现)
典型场景	语言任务/动态决策	结构化流程/任务编排	定制化需求/特定工作流

LlmAgent

LlmAgent（通常简称为Agent）是ADK中的核心组件，充当应用程序的"大脑"。它利用大型语言模型（LLM）的强大能力，实现推理、自然语言理解、决策制定、内容生成以及工具调用等功能。

与遵循预定义执行路径的确定性工作流智能体不同，LlmAgent具有非确定性特征。它依托LLM解析指令和上下文，动态决策后续操作（包括工具调用选择、是否移交控制权等），实现灵活的任务处理。

构建高效的LlmAgent需要明确定义其身份标识，通过指令精准引导行为，并配置必要的工具与能力集。

创建智能体

from google.adk.agents import LlmAgentagent = LlmAgent(name="",model="",description="",# instruction and tools will be added next
)

参数说明：

name（必填）：每个智能体需具备唯一字符串标识符。该名称在内部运维中至关重要，尤其涉及多智能体系统中的任务互调时。应选择体现功能特征的描述性名称（如customer_support_router、billing_inquiry_agent），避免使用user等保留名称。

description（可选，多智能体场景推荐）：提供智能体能力的简明概述。该描述主要用于其他LLM智能体判断是否路由任务至本智能体。需具备足够特异性以区分同类（例如"处理当前账单查询"，而非笼统的"账单智能体"）。

model（必填）：指定驱动智能体推理的底层LLM模型。采用字符串标识符如"gemini-2.0-flash"。模型选择直接影响智能体能力、成本及性能表现。可选模型及考量因素详见模型列表。

instruction 参数说明

引导智能体行为：instruction参数是塑造LlmAgent行为最关键的核心配置。该参数接受字符串或字符串生成函数，用于向智能体明确以下行为准则：

其核心任务与目标：明确智能体需要完成的主要工作及成功标准。
角色设定与人格特征：例如："你是一个乐于助人的助手"、"你扮演幽默的海盗角色"，通过人格模板塑造交互风格。
行为约束规范：限定操作范围（如"仅回答X相关问题"），设置禁忌条款（如"严禁透露Y信息"）。
工具调用策略：说明每个工具的设计用途及调用条件，需补充工具自身的描述不足处，包含触发阈值、参数规范等工程细节。
输出格式要求：结构化输出（如"以JSON格式响应"），呈现形式规范（如"使用项目符号列表"），包含数据类型、字段说明等约束。

设计要诀：

清晰明确性原则：规避歧义，精确声明预期行为与输出标准。
采用Markdown结构化：运用标题/列表等元素提升复杂指令可读性
少样本示例集成：针对复杂任务或特定输出格式，应在指令中直接内嵌范例
工具调用引导规范：超越工具枚举，明确调用时机与决策逻辑

可在字符串模板中使用动态变量

指令作为字符串模板，支持通过{var}语法插入动态变量值。
{var} 用于插入名为 var 的状态变量值
{artifact.var} 用于插入名为 var 的工件文本内容
若状态变量或工件不存在，智能体将抛出错误。如需忽略错误，可在变量名后添加 ?，如 {var?}。

# Example: Adding instructions
capital_agent = LlmAgent(model="gemini-2.0-flash",name="capital_agent",description="解答用户关于各国首都的查询",instruction="""您正在使用【首都查询智能体】当用户查询某国首都时，请按以下标准流程响应：1. 从用户查询中识别国家名称2. 调用 get_capital_city 工具获取首都数据3. 向用户清晰反馈首都信息示例查询："法国的首都是哪里？"示例回复：“法国的首都是巴黎。”""",# tools will be added next
)

（注：对于适用于系统中所有智能体的指令，可考虑在根智能体上配置 global_instruction 参数，具体用法详见《多智能体系统》章节。）

配置智能体工具Tools

工具集赋予LlmAgent超越LLM内置知识与推理的能力，使其能够：

与外部系统交互
执行精准计算
获取实时数据流
触发特定操作

tools（可选）：配置智能体可使用的工具列表。列表中的每个工具可以是以下任意一种形式：

Python函数（将自动封装为FunctionTool）
继承自 BaseTool 的类实例
其他智能体的实例（通过AgentTool实现智能体间任务委托 - 详见《多智能体系统》）

大语言模型（LLM）会根据函数/工具的名称、描述（来自文档字符串或描述字段）以及参数模式，结合当前对话内容和自身指令，来决定调用哪个工具。

# Define a tool function
def get_capital_city(country: str) -> str:"""Retrieves the capital city for a given country."""# Replace with actual logic (e.g., API call, database lookup)capitals = {"france": "Paris", "japan": "Tokyo", "canada": "Ottawa"}return capitals.get(country.lower(), f"Sorry, I don't know the capital of {country}.")# Add the tool to the agent
capital_agent = LlmAgent(model="gemini-2.0-flash",name="capital_agent",description="Answers user questions about the capital city of a given country.",instruction="""You are an agent that provides the capital city of a country... (previous instruction text)""",tools=[get_capital_city] # Provide the function directly
)

高级配置与控制

精细化调控 LLM 生成（generate_content_config）

您可以通过 generate_content_config 参数深度调整底层大语言模型（LLM）的响应生成方式，具体支持以下维度的精细化控制：

temperature（随机性控制）：

取值范围 0.0 ~ 2.0，默认值通常为 0.9
低值（如 0.2）：输出更确定、保守，适用于事实性回答
高值（如 1.0）：增强创造性，适合创意生成或开放式对话

max_output_tokens（响应长度限制）：

设定生成内容的最大 token 数量（如 300），避免冗长响应

top_p & top_k（候选词筛选）：

top_p（0.0 ~ 1.0）：动态截断概率分布（如 0.8 保留前 80% 可能词）
top_k（整数）：限制每步采样候选词数量（如 40 仅考虑前 40 个最佳词）

safety_settings（内容安全过滤）：

配置敏感内容拦截等级（如 BLOCK_LOW/BLOCK_MEDIUM/BLOCK_HIGH）
支持按类别过滤（如 HARM_CATEGORY_HATE_SPEECH 仇恨言论检测）

from google.genai import typesagent = LlmAgent(# ... other paramsgenerate_content_config=types.GenerateContentConfig(temperature=0.2, # More deterministic outputmax_output_tokens=250)
)

结构化数据控制

（input_schema / output_schema / output_key）

在需要结构化数据交互的场景中，您可以通过 Pydantic 模型 实现严格的输入/输出控制，确保数据格式的规范性和类型安全。

input_schema（可选参数）

通过定义 Pydantic 的 BaseModel 类，严格规范输入数据的结构。启用后，所有传入该 Agent 的用户消息内容必须是符合此模型的 JSON 字符串，系统会自动执行校验与转换。

output_schema（可选）

定义一个表示预期输出结构的 Pydantic BaseModel 类。如果设置此项，智能体的最终响应必须是符合此模式的 JSON 字符串。使用 output_schema 会启用大语言模型（LLM）的受控生成功能，但同时会禁用智能体调用工具或将控制权转移给其他智能体的能力。您需要通过指令明确引导 LLM 直接生成符合该模式的 JSON。

output_key（可选参数）

当设置此参数时，Agent 的最终文本响应会自动保存到会话状态字典中（session.state[output_key]），实现跨 Agent 或工作流步骤的数据传递。

from pydantic import BaseModel, Fieldclass CapitalOutput(BaseModel):capital: str = Field(description="The capital of the country.")structured_capital_agent = LlmAgent(# ... name, model, descriptioninstruction="""若输入国家为"中国"，则严格按 {"capital": "北京"} JSON格式返回，不包含任何额外文本或解		释。""",output_schema=CapitalOutput, # Enforce JSON outputoutput_key="found_capital"  # Store result in state['found_capital']# Cannot use tools=[get_capital_city] effectively here
)

上下文管理（include_contents）

控制智能体是否接收历史对话记录

include_contents（可选，默认值：'default'）：控制是否将对话历史内容传递给大语言模型（LLM）。

'default'（默认模式）：智能体会接收到相关的对话历史，使其能够基于上下文进行连贯的多轮交互（例如，理解指代或延续之前的任务）。
'none'（无历史模式）：智能体不会接收任何先前的对话内容，仅根据当前指令和本轮输入生成响应（适用于无状态任务或强制限定上下文场景）。

stateless_agent = LlmAgent(# ... other paramsinclude_contents='none'
)

案例：完整代码

# 获取国家首都的示例代码
# --- 以下是演示 LlmAgent 使用工具（Tools）与输出模式（Output Schema）对比 的完整示例代码及说明 ---
import asyncio
import json # Needed for pretty printing dictsfrom google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
from pydantic import BaseModel, Field# --- 1. 定义常量 ---
APP_NAME = "agent_comparison_app"
USER_ID = "test_user_456"
SESSION_ID_TOOL_AGENT = "session_tool_agent_xyz"
SESSION_ID_SCHEMA_AGENT = "session_schema_agent_xyz"
MODEL_NAME = "gemini-2.0-flash"# --- 2. 使用LiteLLM调用在线模型 ---
model_client = LiteLlm(model="deepseek/deepseek-chat",api_base="https://api.deepseek.com",api_key="sk-xxxxxx",
)# --- 3. 定义数据模型 ---
# Input schema used by both agents
class CountryInput(BaseModel):country: str = Field(description="要获取相关信息的国家。")# Output schema ONLY for the second agent
class CapitalInfoOutput(BaseModel):capital: str = Field(description="该国家的首都城市。")# Note: 人口数据为示意值；由于设定了输出格式（output_schema），# 大语言模型（LLM）将自行推断或估算该数值（此时无法调用外部工具获取真实数据）。population_estimate: str = Field(description="该首都城市的估计人口数量。")# --- 4. 定义工具 ---
def get_capital_city(country: str) -> str:"""获取指定国家的首都城市名称。"""print(f"\n-- Tool Call: get_capital_city(country='{country}') --")country_capitals = {"美国": "华盛顿哥伦比亚特区","加拿大": "渥太华","法国": "巴黎","日本": "东京",}result = country_capitals.get(country.lower(), f"Sorry, I couldn't find the capital for {country}.")print(f"-- Tool Result: '{result}' --")return result# --- 5. 配置 Agents ---# Agent 1: Uses a tool and output_key
capital_agent_with_tool = LlmAgent(model=model_client,name="capital_agent_tool",description="调用指定工具获取国家首都城市信息",instruction="""您是一个智能助手，专门通过工具查询国家首都信息。工作流程：1、接收用户输入的JSON格式数据：{"country": "国家名称"}2、自动提取country字段值3、调用get_capital_city工具查询首都4、以清晰语句向用户返回查询结果示例：用户输入：{"country": "法国"}  助手响应：根据查询结果，法国的首都是巴黎。""",tools=[get_capital_city],input_schema=CountryInput,output_key="capital_tool_result", # Store final text response
)# Agent 2: Uses output_schema (NO tools possible)
structured_info_agent_schema = LlmAgent(model=model_client,name="structured_info_agent_schema",description="提供以特定JSON格式标注的首都及预估人口数据。",instruction=f"""你是一个提供国家信息的智能体用户将以JSON格式提供国家名称，如{{“country”：“country_name”}}。仅使用与此确切模式匹配的JSON对象进行响应：EXAMPLE JSON OUTPUT:{{"capital": "日本","population_estimate": "1万"}}用你已有知识判断其首都并估算人口。不要使用任何工具。""",# *** NO tools parameter here - using output_schema prevents tool use ***input_schema=CountryInput,output_schema=CapitalInfoOutput, # Enforce JSON output structureoutput_key="structured_info_result", # Store final JSON response
)# --- 6. 设置会话管理器Session与运行器 ---
session_service = InMemorySessionService()# 为清晰起见创建独立会话（若上下文管理得当则非必需）
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_TOOL_AGENT)
session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_SCHEMA_AGENT)# 为每个智能体创建独立的运行器
capital_runner = Runner(agent=capital_agent_with_tool,app_name=APP_NAME,session_service=session_service
)
structured_runner = Runner(agent=structured_info_agent_schema,app_name=APP_NAME,session_service=session_service
)# --- 7. 定义智能体交互逻辑 ---
async def call_agent_and_print(runner_instance: Runner,agent_instance: LlmAgent,session_id: str,query_json: str
):"""向指定的 Agent/Runner 发送查询并打印结果。"""print(f"\n>>> Calling Agent: '{agent_instance.name}' | Query: {query_json}")user_content = types.Content(role='user', parts=[types.Part(text=query_json)])final_response_content = "No final response received."async for event in runner_instance.run_async(user_id=USER_ID, session_id=session_id, new_message=user_content):# print(f"Event: {event.type}, Author: {event.author}") # Uncomment for detailed loggingif event.is_final_response() and event.content and event.content.parts:# For output_schema, the content is the JSON string itselffinal_response_content = event.content.parts[0].textprint(f"<<< Agent '{agent_instance.name}' Response: {final_response_content}")current_session = session_service.get_session(app_name=APP_NAME,user_id=USER_ID,session_id=session_id)stored_output = current_session.state.get(agent_instance.output_key)# 如果存储的输出类似 JSON（可能来自 output_schema），则进行格式化美化打印。print(f"--- Session State ['{agent_instance.output_key}']: ", end="")try:# 若内容为 JSON 格式，则尝试解析并美化输出parsed_output = json.loads(stored_output)print(json.dumps(parsed_output, indent=2))except (json.JSONDecodeError, TypeError):# Otherwise, print as stringprint(stored_output)print("-" * 30)# --- 7. Run Interactions ---
async def main():print("--- Testing Agent with Tool ---")await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "日本"}')#await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "加拿大"}')print("\n\n--- Testing Agent with Output Schema (No Tool Use) ---")await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "日本"}')#await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "日本"}')if __name__ == "__main__":asyncio.run(main())

运行结果：

--- Testing Agent with Tool --->>> Calling Agent: 'capital_agent_tool' | Query: {"country": "日本"}-- Tool Call: get_capital_city(country='日本') --
-- Tool Result: '东京' --
<<< Agent 'capital_agent_tool' Response: 根据查询结果，日本的首都是东京。
--- Session State ['capital_tool_result']: 根据查询结果，日本的首都是东京。
--------------------------------- Testing Agent with Output Schema (No Tool Use) --->>> Calling Agent: 'structured_info_agent_schema' | Query: {"country": "日本"}
<<< Agent 'structured_info_agent_schema' Response: {"capital": "东京","population_estimate": "1.26亿"
}
--- Session State ['structured_info_result']: {'capital': '东京', 'population_estimate': '1.26亿'}
------------------------------Process finished with exit code 0