从零开始构建一个基于Gemini 3的AI智能体

news/2025/11/25 9:53:58/文章来源:https://www.cnblogs.com/bigben0123/p/19266974

这篇文章介绍了如何从零开始构建一个基于Gemini 3的AI智能体。智能体的核心构成非常简单,包括一个LLM、可执行的工具、上下文/记忆以及一个不断循环的流程。文章通过逐步引导,展示了从基本的文本生成到创建一个功能完善的CLI代理的过程。

Practical Guide on how to build an Agent from scratch with Gemini 3

It seems complicated, when you watch an AI agent edit multiple files, run commands, handle errors, and iteratively solve a problem, it feels like magic. But it isn’t. The secret to building an agent is that there is no secret.

The core of an Agent is surprisingly simple: It is a Large Language Model (LLM) running in a loop, equipped with tools it can choose to use.

If you can write a loop in Python, you can build an agent. This guide will walk you through the process, from a simple API call to a functioning CLI Agent.

What actually is an Agent?

Traditional software workflows are prescriptive and follow predefined paths (Step A -> Step B -> Step C), Agents are System that uses an LLM to dynamically decide the control flow of an application to achieve a user goal.

An agent generally consists of these core components:

  1. The Model (Brain): The reasoning engine, in our case a Gemini model. It reasons through ambiguity, plans steps, and decides when it needs outside help.
  2. Tools (Hands and Eyes): Functions the agent can execute to interact with the outside world/environment (e.g., searching the web, reading a file, calling an API).
  3. Context/Memory (Workspace): The information the agent has access to at any moment. Managing this effectively, known as Context Engineering.
  4. The Loop (Life): A while loop that allows the model to: Observe → Think → Act → Observe again, until the task is complete.

agent

"The Loop" of nearly every agent is an iterative process:

  1. Define Tool Definitions: You describe your available tools (e.g., get_weather) to the model using a structured JSON format.
  2. Call the LLM: You send the user's prompt and the tool definitions to the model.
  3. Model Decision: The model analyzes the request. If a tool is needed, it returns a structured tool use containing the tool name and arguments.
  4. Execute Tool (Client Responsibility): The client/application code intercepts this tool use, executes the actual code or API call, and captures the result.
  5. Respond and Iterate: You send the result (the tool response) back to model. The model uses this new information to decide the next step, either calling another tool or generating the final response.

Building an Agent

Let's build an agent step-by-step, progressing from basic text generation to a functional CLI agent using Gemini 3 Pro and Python SDK.

Prerequisites: Install the SDK (pip install google-genai) and set your GEMINI_API_KEY environment variable (Get it in AI Studio).

Step 1: Basic Text Generation and Abstraction

The first step is to create a baseline interaction with the LLM, for us Gemini 3 Pro. We are going to create a simple Agent class abstraction to structure our code, which we will extend throughout this guide. We will first start with a simple chatbot that maintains a conversation history.

from google import genaifrom google.genai import types class Agent:    def __init__(self, model: str):        self.model = model        self.client = genai.Client()        self.contents = []     def run(self, contents: str):        self.contents.append({"role": "user", "parts": [{"text": contents}]})         response = self.client.models.generate_content(model=self.model, contents=self.contents)        self.contents.append(response.candidates[0].content)         return response agent = Agent(model="gemini-3-pro-preview")response1 = agent.run(    contents="Hello, What are top 3 cities in Germany to visit? Only return the names of the cities.") print(f"Model: {response1.text}")# Output: Berlin, Munich, Cologne response2 = agent.run(    contents="Tell me something about the second city.") print(f"Model: {response2.text}")# Output: Munich is the capital of Bavaria and is known for its Oktoberfest.

This is not an agent yet. It is a standard chatbot. It maintains state but cannot take action, has no "hands or eyes".

Step 2: Giving it Hands & Eyes (Tool Use)

To start turning this an agent, we need Tool Use or Function Calling. We provide the agent with tools. This requires defining the implementation (the Python code) and the definition (the schema the LLM sees). If the LLM believes that tool will help solve a user's prompt, it will return a structured request to call that function instead of just text.

We are going to create 3 tools, read_filewrite_file, and list_dir. A tool Definition is a JSON schema that defines the namedescription, and parameters of the tool.

Best Practice: Use the description fields to explain when and how to use the tool. The model relies heavily on these to understand when and how to use the tool. Be explicit and clear.

import osimport json read_file_definition = {    "name": "read_file",    "description": "Reads a file and returns its contents.",    "parameters": {        "type": "object",        "properties": {            "file_path": {                "type": "string",                "description": "Path to the file to read.",            }        },        "required": ["file_path"],    },} list_dir_definition = {    "name": "list_dir",    "description": "Lists the contents of a directory.",    "parameters": {        "type": "object",        "properties": {            "directory_path": {                "type": "string",                "description": "Path to the directory to list.",            }        },        "required": ["directory_path"],    },} write_file_definition = {    "name": "write_file",    "description": "Writes a file with the given contents.",    "parameters": {        "type": "object",        "properties": {            "file_path": {                "type": "string",                "description": "Path to the file to write.",            },            "contents": {                "type": "string",                "description": "Contents to write to the file.",            },        },        "required": ["file_path", "contents"],    },} def read_file(file_path: str) -> dict:    with open(file_path, "r") as f:        return f.read() def write_file(file_path: str, contents: str) -> bool:    """Writes a file with the given contents."""    with open(file_path, "w") as f:        f.write(contents)    return True def list_dir(directory_path: str) -> list[str]:    """Lists the contents of a directory."""    full_path = os.path.expanduser(directory_path)    return os.listdir(full_path) file_tools = {    "read_file": {"definition": read_file_definition, "function": read_file},    "write_file": {"definition": write_file_definition, "function": write_file},    "list_dir": {"definition": list_dir_definition, "function": list_dir},}

Now we integrate the tools and function calls into our Agent class.

from google import genaifrom google.genai import types class Agent:    def __init__(self, model: str,tools: list[dict]):        self.model = model        self.client = genai.Client()        self.contents = []        self.tools = tools     def run(self, contents: str):        self.contents.append({"role": "user", "parts": [{"text": contents}]})         config = types.GenerateContentConfig(            tools=[types.Tool(function_declarations=[tool["definition"] for tool in self.tools.values()])],        )         response = self.client.models.generate_content(model=self.model, contents=self.contents, config=config)        self.contents.append(response.candidates[0].content)         return response agent = Agent(model="gemini-3-pro-preview", tools=file_tools) response = agent.run(    contents="Can you list my files in the current directory?")print(response.function_calls)# Output: [FunctionCall(name='list_dir', arguments={'directory_path': '.'})]

Great! The model has successfully called the tool. Now, we need to add the tool execution logic to our Agent class and the loop return the result back to the model.

Step 3: Closing the Loop (The Agent)

An Agent isn't about generating one tool call, but about generating a series of tool calls, returning the results back to the model, and then generating another tool call, and so on until the task is completed.

The Agent class handles the core loop: intercepting the FunctionCall, executing the tool on the client side, and sending back the FunctionResponse. We also add a SystemInstruction to the model to guide the model on what to do.

Note: Gemini 3 uses Thought signatures to maintain reasoning context across API calls. You must return these signatures back to the model in your request exactly as they were received.

# ... Code for the tools and tool definitions from Step 2 should be here ... from google import genaifrom google.genai import types class Agent:    def __init__(self, model: str,tools: list[dict], system_instruction: str = "You are a helpful assistant."):        self.model = model        self.client = genai.Client()        self.contents = []        self.tools = tools        self.system_instruction = system_instruction     def run(self, contents: str | list[dict[str, str]]):        if isinstance(contents, list):            self.contents.append({"role": "user", "parts": contents})        else:            self.contents.append({"role": "user", "parts": [{"text": contents}]})         config = types.GenerateContentConfig(            system_instruction=self.system_instruction,            tools=[types.Tool(function_declarations=[tool["definition"] for tool in self.tools.values()])],        )         response = self.client.models.generate_content(model=self.model, contents=self.contents, config=config)        self.contents.append(response.candidates[0].content)         if response.function_calls:            functions_response_parts = []            for tool_call in response.function_calls:                print(f"[Function Call] {tool_call}")                 if tool_call.name in self.tools:                    result = {"result": self.tools[tool_call.name]["function"](**tool_call.args)}                else:                    result = {"error": "Tool not found"}                 print(f"[Function Response] {result}")                functions_response_parts.append({"functionResponse": {"name": tool_call.name, "response": result}})             return self.run(functions_response_parts)                return response agent = Agent(    model="gemini-3-pro-preview",     tools=file_tools,     system_instruction="You are a helpful Coding Assistant. Respond like you are Linus Torvalds.") response = agent.run(    contents="Can you list my files in the current directory?")print(response.text)# Output: [Function Call] id=None args={'directory_path': '.'} name='list_dir'# [Function Response] {'result': ['.venv', ... ]}# There. Your current directory contains: `LICENSE`,

Congratulations. You just built your first functioning agent.

Phase 4: Multi-turn CLI Agent

Now we can run our agent in a simple CLI loop. It takes surprisingly little code to create highly capable behavior.

# ... Code for the Agent, tools and tool definitions from Step 3 should be here ... agent = Agent(    model="gemini-3-pro-preview",     tools=file_tools,     system_instruction="You are a helpful Coding Assistant. Respond like you are Linus Torvalds.") print("Agent ready. Ask it to check files in this directory.")while True:    user_input = input("You: ")    if user_input.lower() in ['exit', 'quit']:        break     response = agent.run(user_input)    print(f"Linus: {response.text}\n")

Best Practices for Engineering Agents

Building the loop is easy; making it reliable, transparent, and controllable is hard. Here are key engineering principles derived from top industry practices, grouped by functional area.

1. Tool Definition & Ergonomics

Your tools are the interface for the model. Don't just wrap your existing internal APIs. If a tool is confusing to a human, it's confusing to the model:

  • Clear Naming: Use obvious names like search_customer_database rather than cust_db_v2_query.
  • Precise Descriptions: Gemini reads the function docstrings to understand when and how to use a tool. Spend time writing these carefully, it is essentially "prompt engineering" for tools.
  • Return Meaningful Errors: Don't return a 50-line Java stack trace. If a tool fails, return a clear string like Error: File not found. Did you mean 'data.csv'?. This allows the agent to self-correct.
  • Tolerate Fuzzy Inputs: If a model frequently guesses file paths wrong, update your tool to handle relative paths or fuzzy inputs rather than just erroring out.

2. Context Engineering

Models have a finite "attention budget." Managing what information enters the context is crucial for performance and cost.

  • Don't "Dump" Data: Don't have a tool that returns an entire 10MB database table. Instead of get_all_users(), create search_users(query: str).
  • Just-in-time Loading: Instead of pre-loading all data (traditional RAG), use just-in-time strategies. The agent should maintain lightweight identifiers (file paths, IDs) and use tools to dynamically load content only when needed.
  • Compression: For very long-running agents, summarize the history, remove old context or start a new sessions.
  • Agentic Memory: Allow the agent to maintain notes or a scratchpad persisted outside the context window, pulling them back in only when relevant.

3. Don't over engineer

It's tempting to build complex multi-agent systems. Don't.

  • Maximize a Single Agent First: Don't immediately build complex multi-agent systems. Gemini is highly capable of handling dozens of tools in a single prompt.
  • Escape Hatches: Ensure loops can be stopped like a max_iterations break (e.g., 15 turns).
  • Guardrails and System Instructions: Use the system_instruction to guide the model with hard rules (e.g., "You are strictly forbidden from offering refunds greater than $50") or use external classifier.
  • Human-in-the-loop: For sensitive actions (like send_email or execute_code), pause the loop and require user confirmation before the tool is actually executed.
  • Prioritize Transparency and Debugging: Log tool calls and parameters. Analyzing the model's reasoning helps identify if issues, and improve the agent over time.

Conclusion

Building an agent is no longer magic; it is a practical engineering task. As we've shown, you can build a working prototype in under 100 lines of code. While understanding these fundamentals is key, don't get bogged down re-engineering the same pattern over and over. The AI community has created fantastic open-source libraries that can help you build more complex and robust agents faster.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/975626.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

2025年市场可靠的清障车厂家推荐,直臂高空作业车/黄牌清障车/常奇清障车/程力清障车/清障车带吊/五十铃清障车清障车直销厂家推荐榜单

行业权威榜单发布 随着我国道路交通网络的不断完善,清障车作为应急救援体系的重要装备,其市场需求持续增长。面对众多专用汽车制造企业,如何选择专业可靠的清障车厂家成为采购决策的关键。本文基于市场调研数据、产…

2025年合肥牛羊肉供应商综合实力排行榜TOP10:专业评测与选择指南

摘要 随着生鲜食品行业标准化程度不断提升,合肥市牛羊肉批发市场呈现快速发展态势。2025年,本地市场对高品质牛羊肉的需求持续增长,供应商之间的竞争愈发激烈。本文基于行业数据、企业实力、客户评价等多维度指标,…

2025 年 11 月门窗展会权威推荐榜:移门/全屋定制/淋浴房/五金型材/门窗机械/木工机械/玻璃门展会全景解析与创新设计风向标

2025年11月门窗展会权威推荐榜:移门/全屋定制/淋浴房/五金型材/门窗机械/木工机械/玻璃门展会全景解析与创新设计风向标 行业背景与发展趋势 随着建筑工业化和消费升级的双重驱动,门窗及全屋定制行业正迎来新一轮技术…

2025年北京婚姻律所权威推荐榜单:离婚律所/离婚事务所/离婚房产律所团队精选

随着社会经济发展和家庭结构多元化,婚姻家事法律市场需求显著增长。根据司法部统计数据显示,2024年全国婚姻家事案件受理量已突破180万件,其中涉及跨境财产分割、非婚生子女权益、家族信托纠纷等新型案件占比达37%,…

2025 最新纸塑分离机厂家推荐排行榜:涵盖不干胶 / 淋膜纸 / 奶盒等多场景,权威筛选优质设备厂商

引言 随着环保政策收紧与资源回收需求升级,纸塑分离成为环保产业的核心环节,不干胶、淋膜纸、利乐奶盒等各类纸塑复合物的高效分离需求持续增长。传统人工分离模式效率低、成本高,已无法满足规模化生产需求,而市场…

2025年河南知名的伸缩门供应商综合实力排行榜

摘要 随着智慧城市建设的加速推进,2025年河南伸缩门行业迎来新一轮技术升级与市场洗牌。智能伸缩门作为出入口管理的核心设备,其安全性、智能化程度和耐用性成为用户关注焦点。本文基于市场调研数据、用户口碑评价和…

2025 定制叠层母排厂家优选指南:深圳市格雷特通讯科技有限公司浸粉叠层母排定制 / 叠层母排浸粉专业解决方案

在工业自动化、清洁能源、数据中心等高端制造领域,大电流传输部件的稳定性与安全性直接决定设备运行效率,叠层母排作为核心导电连接件,其定制化能力、工艺精度与绝缘性能成为企业选型的关键指标。深圳市格雷特通讯科…

2025年最新上门家教老师综合实力排行,上门家教/一对一家教上门家教机构老师推荐榜单

行业背景分析 随着教育个性化需求的持续增长,上门家教行业在2025年迎来新一轮发展机遇。据教育行业公开数据显示,一对一上门家教市场规模较去年增长23%,师资专业化程度与服务质量成为家长选择的关键考量因素。本次排…

中文乱码

#include <QTextCodec> // 添加头文件 QTextCodec* codec = QTextCodec::codecForName("gbk"); codec->toUnicode("名称")

2025年靠谱的数据中心感烟火灾探测器行业内知名厂家排行榜

2025年靠谱的数据中心感烟火灾探测器行业内知名厂家排行榜行业背景与市场趋势随着全球数字化转型加速,数据中心作为信息基础设施的核心载体,其安全运营日益受到重视。据国际数据公司(IDC)最新报告显示,2024年全球数…

020-Spring AI Alibaba DashScope Image 功能完整案例 - 指南

pre { white-space: pre !important; word-wrap: normal !important; overflow-x: auto !important; display: block !important; font-family: "Consolas", "Monaco", "Courier New", …

2025年上海全铝家居定制生产商 top10 权威推荐榜单

摘要 随着环保意识的提升和家居消费升级,上海全铝家居定制行业在2025年迎来快速发展,全铝家居因其无醛、防潮、耐用等优势成为市场新宠。本文基于行业数据、用户口碑和技术评测,综合评选出上海地区全铝家居定制生产…

2025年北京离婚房产律所权威推荐榜单:婚姻律所/离婚事务所/婚姻专业律师团队精选

随着北京离婚案件数量增多和家庭财产结构复杂化,离婚房产分割案件呈现专业化、高标的额趋势。根据北京市司法局2024年数据,全市离婚案件中涉及房产分割争议的比例高达68.5%,其中诉讼标的额超500万元的案件占比达27.…

2025年评价高的密植果树拉技塑钢线厂家选购指南与推荐

2025年评价高的密植果树拉技塑钢线厂家选购指南与推荐行业背景与市场趋势随着现代农业技术的快速发展,密植果树栽培技术已成为提高果园单位面积产量的重要手段。在这一技术体系中,塑钢拉枝线作为支撑和引导果树生长的…

2025金属复合板厂家哪家好:吉祥金属复合板厂家解读

2025金属复合板厂家哪家好:吉祥金属复合板厂家解读!随着建筑与工业领域对材料性能要求的不断提高,金属复合板因其轻量化、高强度和良好的设计适应性,逐渐成为市场关注的重点。许多中小型厂家也凭借特色产品和技术在细…

2025年上海全铝家居定制品牌前十强权威推荐榜单

摘要 随着环保意识的提升和家居消费升级,全铝家居行业在2025年迎来快速发展期。上海作为全铝家居定制的重要市场,涌现出一批技术领先、服务优质的企业。本文基于市场调研和用户反馈,为您呈现上海地区全铝家居定制品…

2025中山留学中介哪家好?优质机构解析

2025中山留学中介哪家好?机构解析。在中山地区,多家留学服务机构围绕学子的留学需求开展业务。以下将从机构资质认证、服务覆盖范围、业务核心特点等维度进行梳理,为有留学规划的学子提供参考。一、中山市粤教国际教…

2025ESD静电管的工厂测评-电感工厂实力分析

在电子产业快速发展的当下,ESD 静电管作为保护电子设备免受静电损害的关键元器件,其质量与性能备受企业关注。今天,我们将深入测评 10 家在 ESD 静电管领域表现突出的工厂,其中不乏一些低调却实力强劲的企业,它们…

2025 年 11 月中国水泵厂家权威推荐榜:涵盖管道/消防/多级/自吸/磁力/排污/真空/离心/卧式水泵,匠心制造与高效性能深度解析

2025 年 11 月中国水泵厂家权威推荐榜:涵盖管道/消防/多级/自吸/磁力/排污/真空/离心/卧式水泵,匠心制造与高效性能深度解析 水泵作为工业生产和民生保障的核心设备,其技术发展与产业升级直接关系到国家基础设施建设…

kubernetes pod是什么?

1、容器、Pod是什么容器是一个独立的环境,我们在其中打包应用程序及其依赖项。通常,容器运行单个进程,每个容器都有一个IP地址,可以附加卷并控制CPU和内存资源等。所有这些都是通过命名空间和控制组的概念发生的。…