跳轉到

你沒有回答,我就採合理預設:這篇的「原文」就是現有的繁體中文,因此我做成中文原文在前、英文翻譯在後的中英對照(標題格式為「中文 / English」)。這樣不需要捏造任何「英文原始用字」,最忠實。以下為純 Markdown:


[未來趨勢] AI Agent:經驗篇 — 從 Google 專家的眼中,了解更多 AI Agent / [Future Trends] AI Agents: The Experience — Understanding AI Agents Through the Eyes of a Google Expert

Author: Simon Liu Published: Source: https://medium.com/@simon3458/ai-agent-google-expert-whitepaper-937e155cb2ab Fetched: 2026-06-07T02:07:12.924742


[未來趨勢] AI Agent:經驗篇 — 從 Google 專家的眼中,了解更多 AI Agent / [Future Trends] AI Agents: The Experience — Understanding AI Agents Through the Eyes of a Google Expert

揭露資訊:部分內文經由 AI 工具彙整與撰寫,並由我進行修正,如有錯誤,歡迎留言告知我,讓文章更加完善!

Disclosure: Part of the content was compiled and drafted with AI tools and revised by me. If you spot any errors, feel free to leave a comment so I can improve the article!

Press enter or click to view image in full size

Google 在去年九月時,在 Kaggle 平台公佈了一篇由 Julia Wiesinger、Patrick Marlow 和 Vladimir Vuskovic 撰寫的 Agent 主題白皮書。本次文章,我將進行彙整,讓大家更理解 Agent 的概念。

Last September, Google published a whitepaper on the topic of Agents, authored by Julia Wiesinger, Patrick Marlow, and Vladimir Vuskovic, on the Kaggle platform. In this article, I will compile its contents to help everyone better understand the concept of Agents.

[## Agents

Authors: Julia Wiesinger, Patrick Marlow and Vladimir Vuskovic

www.kaggle.co](https://www.kaggle.com/whitepaper-agents?source=post_page-----937e155cb2ab---------------------------------------)

I. 什麼是 AI Agent? / I. What Is an AI Agent?

AI Agent 是一種應用程式,其核心功能是透過觀察環境並運用工具採取行動,以實現特定目標。這些 Agent 具有以下幾個關鍵特徵:

An AI Agent is an application whose core function is to take action by observing its environment and using tools, in order to achieve specific goals. These Agents have the following key characteristics:

自主性(Autonomy): AI Agents 能夠自主運作,不需要人類的直接干預。當賦予明確目標或任務後,它們能獨立完成相關操作。

Autonomy(自主性): AI Agents can operate autonomously without direct human intervention. Once given a clear goal or task, they can independently carry out the related operations.

目標導向(Goal-Oriented): 它們的行動與決策設計是為了實現特定目標,展現出目標驅動的特性。

Goal-Oriented(目標導向): Their actions and decisions are designed to achieve specific goals, exhibiting goal-driven behavior.

主動性(Proactiveness): 即使缺乏人類明確指令,AI Agents 也能推理與分析,決定下一步行動,努力完成最終目標。

Proactiveness(主動性): Even without explicit human instructions, AI Agents can reason and analyze to decide the next action, striving to accomplish the final goal.

應用場景廣泛: 雖然 AI Agents 的概念十分通用且強大,但本文聚焦於生成式 AI 模型所構建的特定類型 Agents。

Broad Application Scenarios: Although the concept of AI Agents is highly general and powerful, this article focuses on a specific type of Agent built on generative AI models.

以下是作者提出的一個基礎 Agent 架構:

Below is a basic Agent architecture proposed by the authors:

Press enter or click to view image in full size

II. 生成式 AI Agents 的核心組成 / II. The Core Components of Generative AI Agents

在現今的人工智慧領域中,生成式 AI Agents 正迅速成為不可或缺的核心技術。這些不僅僅是語言模型,而是能夠與外部世界互動、完成複雜任務的系統。要理解生成式 AI Agents 的運作,必須從其三大核心組成部分理解:模型 (The Model)、工具 (The Tools) 與指揮層 (The Orchestration Layer)。

In today's field of artificial intelligence, generative AI Agents are rapidly becoming an indispensable core technology. These are not merely language models, but systems capable of interacting with the external world and completing complex tasks. To understand how generative AI Agents operate, one must understand their three core components: The Model(模型), The Tools(工具), and The Orchestration Layer(指揮層).

模型 (The Model):AI Agents 的智慧核心 / The Model(模型): The Intelligent Core of AI Agents

模型是 AI Agents 的「大腦」,負責決策與推理,具備理解指令和邏輯推理的能力。它通常是由一個或多個大型語言模型(如 GPT-3 或類似技術)構成,並採用不同的推理框架,例如 ReAct、Chain-of-Thought (CoT) 或 Tree-of-Thoughts (ToT),來幫助模型深入理解問題並提供合理的解決方案。

The model is the "brain" of an AI Agent, responsible for decision-making and reasoning, with the ability to understand instructions and perform logical reasoning. It is usually composed of one or more large language models (such as GPT-3 or similar technologies) and adopts different reasoning frameworks, such as ReAct, Chain-of-Thought (CoT), or Tree-of-Thoughts (ToT), to help the model deeply understand the problem and provide reasonable solutions.

根據需求,模型可以是通用型、多模態型,或經過特定調整以應對特殊任務。雖然模型本身並不包含 Agent 的具體配置,但可以透過資料進行 Fine-Tuning,以確保最佳效能。簡而言之,模型是 Agent 思考的核心,決定了它的判斷與推理能力。

Depending on the requirements, the model can be general-purpose, multimodal, or specifically tuned to handle particular tasks. Although the model itself does not contain the Agent's specific configuration, it can be fine-tuned with data to ensure optimal performance. In short, the model is the core of the Agent's thinking and determines its judgment and reasoning capabilities.

工具 (The Tools):連接內外世界的橋樑 / The Tools(工具): The Bridge Connecting the Internal and External Worlds

工具是 AI Agents 與外部世界互動的關鍵因子,幫助 Agent 克服僅靠語言模型無法直接處理外部系統或資料的限制。這些工具以多種形式存在,例如 CRUD 方法等。

Tools are the key factor enabling AI Agents to interact with the external world, helping the Agent overcome the limitation that a language model alone cannot directly handle external systems or data. These tools exist in many forms, such as CRUD methods, etc.

工具大幅擴展了 Agent 的能力,使其能存取與處理真實世界資訊,例如查詢天氣或更新資料庫。目前常見的工具類型包括:

Tools greatly expand the Agent's capabilities, enabling it to access and process real-world information, such as checking the weather or updating a database. Currently common tool types include:

  • 擴展 (Extensions):標準化地連接 API 與 Agent,讓 Agent 無縫執行操作。
  • 函數 (Functions):由模型輸出函數與參數,實際 API 呼叫由客戶端執行,為開發者提供更精細的控制。
  • 資料儲存 (Data Stores):以向量資料庫的形式,儲存並提供 Agent 存取動態更新的資訊。

  • Extensions(擴展): Connect APIs and the Agent in a standardized way, allowing the Agent to execute operations seamlessly.

  • Functions(函數): The model outputs the function and its parameters, while the actual API call is executed on the client side, providing developers with more fine-grained control.
  • Data Stores(資料儲存): In the form of a vector database, store and provide the Agent access to dynamically updated information.

工具的存在讓 AI Agents 能真正與外部世界產生互動,而非僅侷限於語言模型本身。

The existence of tools allows AI Agents to genuinely interact with the external world, rather than being confined to the language model itself.

指揮層 (The Orchestration Layer):Agent 的指揮中心 / The Orchestration Layer(指揮層): The Command Center of the Agent

指揮層是 AI Agent 模型與工具的總指揮,決定了它如何接收資訊、進行內部推理,並採取行動。這是一個循環的過程,直到 Agent 完成目標或到達停止點。

The orchestration layer is the overall commander of the AI Agent's model and tools, determining how it receives information, performs internal reasoning, and takes action. This is a cyclical process that continues until the Agent achieves its goal or reaches a stopping point.

指揮層的複雜性不一,可以是簡單的邏輯計算,也可以是高度複雜的規劃與推理。它負責維護記憶、狀態、推理與規劃,並透過提示工程(Prompt Engineering)框架來引導推理與行動。例如:

The complexity of the orchestration layer varies; it can be simple logical computation or highly complex planning and reasoning. It is responsible for maintaining memory, state, reasoning, and planning, and guides reasoning and action through Prompt Engineering(提示工程)frameworks. For example:

  • ReAct:讓模型邊推理邊行動。
  • Chain-of-Thought (CoT):透過中間步驟啟用多步推理。
  • Tree-of-Thoughts (ToT):適用於需要探索或策略性預測的任務。

  • ReAct: Lets the model reason while acting.

  • Chain-of-Thought (CoT): Enables multi-step reasoning through intermediate steps.
  • Tree-of-Thoughts (ToT): Suitable for tasks that require exploration or strategic anticipation.

簡而言之,指揮層確保 AI Agent 的每一步都按計劃進行,像是「執行大腦」。

In short, the orchestration layer ensures that every step of the AI Agent proceeds according to plan, acting like an "execution brain."

三者協同運作:模型、工具與指揮層的協作之美 / The Three Working in Concert: The Beauty of Collaboration Among the Model, Tools, and Orchestration Layer

這三個組成部分相互協作,使得 AI Agent 能有效地完成任務。模型負責「思考」,工具負責「執行」,而指揮層則負責「規劃與控制」。這種協作就像一位廚師在廚房中工作:

These three components collaborate with one another, enabling the AI Agent to complete tasks effectively. The model is responsible for "thinking," the tools for "execution," and the orchestration layer for "planning and control." This collaboration is much like a chef working in a kitchen:

  • 廚師先收集資訊(食材與指令)。
  • 然後根據資訊進行推理(決定菜色與烹飪方式)。
  • 接著執行動作(切菜、烹煮)。
  • 最後根據結果調整步驟(品嘗與改良)。

  • The chef first gathers information (ingredients and instructions).

  • Then reasons based on that information (deciding the dish and cooking method).
  • Next executes actions (chopping, cooking).
  • Finally adjusts the steps based on the results (tasting and refining).

透過這樣的方式,AI Agent 展現出強大的能力與適應能力,能自主完成複雜任務。

In this way, the AI Agent demonstrates powerful capability and adaptability, able to autonomously complete complex tasks.

III. Model 和 Agent 的差異 / III. The Difference Between a Model and an Agent

以下提供一張作者所撰寫的差異表格:

Below is a comparison table written by the authors:

Press enter or click to view image in full size

從表格中可以看到,模型的知識來源僅限於其訓練數據,無法管理多輪對話或持續上下文,也缺乏內建工具和邏輯層支援。而 Agents 則透過介接外部工具擴展知識,具備管理多輪會話的能力,並內建推理框架,能夠執行更複雜的任務。這使得 Agents 在處理動態和複雜任務時更具優勢。

As the table shows, a model's knowledge source is limited to its training data; it cannot manage multi-turn conversations or maintain ongoing context, and it lacks built-in tools and a logic layer. Agents, on the other hand, extend their knowledge by interfacing with external tools, possess the ability to manage multi-turn sessions, and have built-in reasoning frameworks, enabling them to perform more complex tasks. This gives Agents an advantage when handling dynamic and complex tasks.

IV. Agent 運作的核心概念 / IV. The Core Concepts of How an Agent Operates

如同廚師在繁忙的廚房中,Agent 使用認知架構來達成其最終目標。透過迭代處理資訊、做出明智的決策,並根據先前的輸出調整下一步的行動,Agent 得以高效運作。

Like a chef in a busy kitchen, an Agent uses a cognitive architecture to achieve its final goal. By iteratively processing information, making informed decisions, and adjusting the next action based on previous outputs, the Agent is able to operate efficiently.

Agent 的核心在於協調層,負責維護記憶、狀態、推理和規劃。協調層利用快速發展的提示工程技術與相關框架來引導推理和規劃,使 Agent 能更有效地完成任務並與環境互動。

At the heart of the Agent is the orchestration layer, responsible for maintaining memory, state, reasoning, and planning. The orchestration layer leverages rapidly evolving prompt engineering techniques and related frameworks to guide reasoning and planning, enabling the Agent to complete tasks and interact with its environment more effectively.

Agent 使用認知架構的運作步驟 / The Operating Steps of an Agent Using a Cognitive Architecture

以下是 Agent 如何使用認知架構來執行任務的步驟範例:

Below is an example of the steps by which an Agent uses a cognitive architecture to execute tasks:

  1. 資訊收集 Agent :收集資訊,例如使用者的查詢或可用的工具和資源。
  2. 內部推理 :根據收集到的資訊進行推理,考慮可用的選項和行動。
  3. 採取行動 :根據推理結果採取行動,例如使用工具、檢索資訊或產生回應。
  4. 調整 :在每個階段,Agent 會根據需要進行調整,利用先前的結果完善計劃,並確定下一步行動。

  5. Information Gathering: The Agent gathers information, such as the user's query or the available tools and resources.

  6. Internal Reasoning: Reasons based on the gathered information, considering the available options and actions.
  7. Taking Action: Acts based on the reasoning results, such as using a tool, retrieving information, or generating a response.
  8. Adjustment: At each stage, the Agent adjusts as needed, using previous results to refine the plan and determine the next action.

推理技術與框架 / Reasoning Techniques and Frameworks

Agent 可採用多種推理技術與框架,以選擇針對使用者請求的最佳行動。以下是常見的框架與技術:

An Agent can adopt a variety of reasoning techniques and frameworks to select the best action for a user's request. Below are common frameworks and techniques:

  • ReAct(Reasoning and Acting): 提供語言模型一種思考過程策略,結合上下文提示進行推理與行動。
  • 鏈式思考(Chain of Thought, CoT): 透過中間步驟實現推理能力。子技術包括:自我一致性、主動提示、多模式 CoT,針對不同應用場景有其優缺點。
  • 思維樹(Tree of Thought, ToT): 適合探索或戰略性前瞻任務。擴展鏈式思考,允許模型探索多種解決問題的中間步驟。

  • ReAct(Reasoning and Acting): Provides the language model with a thought-process strategy that combines contextual prompting with reasoning and acting.

  • Chain of Thought (CoT,鏈式思考): Achieves reasoning capability through intermediate steps. Sub-techniques include self-consistency, active prompting, and multimodal CoT, each with its own pros and cons for different application scenarios.
  • Tree of Thought (ToT,思維樹): Suitable for exploratory or strategic look-ahead tasks. It extends Chain of Thought, allowing the model to explore multiple intermediate steps for solving a problem.

協調層運作範例:ReAct 框架 / An Example of the Orchestration Layer in Action: The ReAct Framework

以下是協調層如何利用 ReAct 框架來引導推理和規劃的過程範例:

Below is an example of how the orchestration layer uses the ReAct framework to guide reasoning and planning:

  1. 查詢:使用者向 Agent 發送查詢。
  2. 啟動 ReAct 序列:Agent 向模型提供提示,啟動 ReAct 框架步驟。
  3. 執行步驟

  4. Query: The user sends a query to the Agent.

  5. Initiate the ReAct Sequence: The Agent provides a prompt to the model, initiating the ReAct framework steps.
  6. Execution Steps:

  7. 問題:從使用者查詢中提取的具體問題。

  8. 想法:模型關於下一步行動的構想。
  9. 行動:模型決定採取的行動,例如選擇工具或檢索資訊。
  10. 行動輸入:模型決定提供給工具的輸入內容。
  11. 觀察:根據行動結果進行反饋。
  12. 最終答案:為原始查詢生成的最終回應。

  13. Question: The specific question extracted from the user's query.

  14. Thought: The model's idea about the next action.
  15. Action: The action the model decides to take, such as selecting a tool or retrieving information.
  16. Action Input: The input the model decides to provide to the tool.
  17. Observation: Feedback based on the result of the action.
  18. Final Answer: The final response generated for the original query.

4. 結束循環:將最終答案返回給使用者。

4. End the Loop: Return the final answer to the user.

Press enter or click to view image in full size

在指揮層中使用 ReAct 推理的 Agent 範例

An example of an Agent using ReAct reasoning within the orchestration layer.

V. Tools:通往外面世界的鑰匙 / V. Tools: The Key to the Outside World

工具是我們可以讓 LLM 通往外部世界的鑰匙。雖然語言模型擅長處理資訊,但它們缺乏直接感知和影響現實世界的能力。這限制了它們在需要與外部系統或資料互動的情況下的用處。這意味著,在某種意義上,語言模型的好壞取決於它從訓練資料中學到的東西。但是,無論我們向模型投入多少資料,它們仍然缺乏與外部世界互動的基本能力。

Tools are the key that allows us to give an LLM access to the external world. Although language models excel at processing information, they lack the ability to directly perceive and affect the real world. This limits their usefulness in situations that require interaction with external systems or data. It means that, in a sense, a language model is only as good as what it learned from its training data. But no matter how much data we feed the model, it still lacks the basic ability to interact with the external world.

為了讓模型能夠與外部系統進行即時、上下文感知的互動,可以使用函數 (Functions)、擴展 (Extensions) 和資料儲存 (Data Stores) 等工具來提供這種關鍵能力。這些工具建立了基礎模型和外部世界之間的連結,使 Agent 能夠執行更廣泛的任務,並且更加準確和可靠。

To enable the model to interact with external systems in real time and in a context-aware manner, tools such as Functions(函數), Extensions(擴展), and Data Stores(資料儲存) can provide this crucial capability. These tools establish a link between the foundation model and the external world, allowing the Agent to perform a broader range of tasks more accurately and reliably.

擴展 (Extensions) / Extensions(擴展)

可以被認為是以標準化方式橋接 API 和 Agent 之間的差距,允許 Agent 無縫執行 API,而不管它們的底層實作如何。擴展透過以下方式橋接 Agent 和 API 之間的差距:

Extensions can be thought of as bridging the gap between an API and the Agent in a standardized way, allowing the Agent to execute APIs seamlessly regardless of their underlying implementation. Extensions bridge the gap between the Agent and the API in the following ways:

  1. 使用範例教導 Agent 如何使用 API 端點。
  2. 教導 Agent 成功調用 API 端點所需的引數或參數。 Agent 可以使用模型和範例來動態選擇最適合解決使用者查詢的擴展

  3. Using examples to teach the Agent how to use the API endpoint.

  4. Teaching the Agent the arguments or parameters required to successfully call the API endpoint. The Agent can use the model and examples to dynamically select the extension best suited to resolving the user's query.

Press enter or click to view image in full size

範例:Agents-Extension-API 關係圖

Example: Agents–Extension–API relationship diagram.

函數 (Functions) / Functions(函數)

與軟體工程中的函數類似,是完成特定任務且可根據需要重複使用的獨立程式碼模組。模型可以利用一組已知的函數,並根據其規格決定何時使用每個函數以及函數需要哪些引數。函數與擴展的不同之處在於:

Similar to functions in software engineering, these are self-contained code modules that accomplish a specific task and can be reused as needed. The model can leverage a set of known functions and, based on their specifications, decide when to use each function and what arguments the function requires. Functions differ from extensions in that:

  • 模型輸出函數及其引數,但不進行即時 API 調用。

  • The model outputs the function and its arguments but does not make a live API call.

Press enter or click to view image in full size

範例:Agents-Function-API 關係圖

Example: Agents–Function–API relationship diagram.

  • 函數在客戶端執行,而擴展在 Agent 端執行。 函數的呼叫邏輯和執行從 Agent 端轉移到客戶端應用程式,為開發人員提供對應用程式中資料流的更精細控制

  • Functions execute on the client side, whereas extensions execute on the Agent side. The call logic and execution of functions are moved from the Agent side to the client-side application, providing developers with more fine-grained control over the data flow within the application.

Press enter or click to view image in full size

Delineating client vs. agent side control for extensions and function calling

資料儲存 (Data Stores) / Data Stores(資料儲存)

透過提供對更動態和最新的資訊的訪問來解決模型的靜態知識限制。資料儲存允許開發人員以其原始格式向 Agent 提供額外資料,而無需耗時的資料轉換、模型重新訓練或微調。 資料儲存通常實作為向量資料庫,Agent 可以在運行時訪問該資料庫。 資料儲存允許 Agent 訪問各種格式的資料,例如:

Data Stores address the model's static knowledge limitation by providing access to more dynamic and up-to-date information. Data Stores allow developers to provide additional data to the Agent in its original format, without time-consuming data transformation, model retraining, or fine-tuning. Data Stores are typically implemented as vector databases that the Agent can access at runtime. Data Stores allow the Agent to access data in various formats, such as:

  • 網站內容
  • 結構化資料,例如 PDF、Word 文件、CSV、試算表等
  • 非結構化資料,例如 HTML、PDF、TXT 等

  • Website content

  • Structured data, such as PDFs, Word documents, CSVs, spreadsheets, etc.
  • Unstructured data, such as HTML, PDF, TXT, etc.

Press enter or click to view image in full size

範例:Agents-data store-resource 關係圖

Example: Agents–data store–resource relationship diagram.

那如果你有設計和儲存了一個 RAG-based 知識庫,你也可以參考以下的方式,建立起 AI agent 的生命週期:

If you have designed and stored a RAG-based knowledge base, you can also refer to the following approach to build the lifecycle of an AI agent:

  1. 使用者查詢:使用者向 Agent發出查詢。
  2. 查詢嵌入 (Query Embedding):使用嵌入模型將使用者查詢轉換為 Embedding Vector。
  3. 向量資料庫匹配:將 Embedding Vector 與向量資料庫的內容進行匹配。
  4. 內容檢索:從向量資料庫中檢索匹配的內容,以文字格式輸出結果。
  5. Agent 處理: Agent接收使用者查詢和檢索到的內容。
  6. 生成回應或採取行動:Agent 根據使用者查詢和檢索到的內容制定回應或決定下一步的行動。
  7. 回覆結果:向使用者傳送最終回覆。

  8. User Query: The user issues a query to the Agent.

  9. Query Embedding(查詢嵌入): An embedding model converts the user query into an embedding vector.
  10. Vector Database Matching: The embedding vector is matched against the contents of the vector database.
  11. Content Retrieval: The matching content is retrieved from the vector database and output in text format.
  12. Agent Processing: The Agent receives the user query and the retrieved content.
  13. Generate a Response or Take Action: Based on the user query and the retrieved content, the Agent formulates a response or decides the next action.
  14. Return the Result: The final response is sent to the user.

Press enter or click to view image in full size

RAG-based AI agent 的生命週期架構圖

Lifecycle architecture diagram of a RAG-based AI agent.

作者在文章中,也很貼心的準備了 Tools recap,整理如下,也更加深這三個執行方式之間的差異性:

In the article, the authors also thoughtfully prepared a Tools recap, summarized below, which further clarifies the differences among these three execution methods:

Press enter or click to view image in full size

VI. 模型能力的增強 / VI. Enhancing Model Capabilities

針對模型效能的增強,目標式學習 (targeted learning) 是一種關鍵方法,可以提升模型在特定任務中的表現,尤其是在需要超出訓練資料範圍的知識時。這種方法類似於從基本烹飪技巧進階到精通特定菜系,需要針對性的學習以獲得更細緻的結果。來源中提到了以下幾種方法來幫助模型獲得這類特定的知識:

For enhancing model performance, targeted learning(目標式學習) is a key approach that can improve a model's performance on specific tasks, especially when knowledge beyond the scope of the training data is required. This approach is similar to advancing from basic cooking skills to mastering a particular cuisine, requiring targeted learning to achieve more refined results. The source mentions the following methods to help the model acquire such specific knowledge:

情境學習 (In-context learning) / In-context Learning(情境學習)

  • 這種方法在推論時 (inference time) 提供通用模型提示 (prompt)工具 (tools)少量範例 (few-shot examples),使其能夠「即時」學習如何以及何時使用這些工具來完成特定任務。
  • ReAct 框架是這種方法的一個例子。它利用自然語言的提示,使模型能夠在接收到用戶查詢時,能夠有效地推理並採取行動。
  • 就像廚師收到特定食譜(提示)、一些關鍵食材(相關工具)和一些示例菜餚(少量示例),然後根據有限的資訊和一般的烹飪知識,即時找出如何準備最符合食譜和客戶偏好的菜餚。

  • This method provides a general-purpose model with a prompt(提示), tools(工具), and few-shot examples(少量範例) at inference time(推論時), enabling it to learn "on the fly" how and when to use these tools to complete a specific task.

  • The ReAct framework is an example of this approach. It uses natural-language prompts to enable the model to reason effectively and take action upon receiving a user query.
  • It is like a chef receiving a particular recipe (prompt), some key ingredients (relevant tools), and a few example dishes (few-shot examples), then figuring out on the spot — based on limited information and general culinary knowledge — how to prepare a dish that best matches the recipe and the customer's preferences.

檢索式情境學習 (Retrieval-based in-context learning) / Retrieval-based In-context Learning(檢索式情境學習)

  • 這種技術透過從外部記憶中檢索最相關的資訊、工具和相關範例,動態地填充模型提示
  • Vertex AI 擴展中的「範例儲存」(Example Store) 或之前提到的基於 RAG 架構的資料儲存就是這種方法的例子。
  • 這就像廚師在廚房裡有一個儲藏豐富的食品儲藏室(外部資料儲存),裡面裝滿了各種食材和食譜(範例和工具)。廚師可以動態地從食品儲藏室中選擇食材和食譜,以便更好地符合客戶的食譜和偏好。

  • This technique dynamically populates the model's prompt by retrieving the most relevant information, tools, and examples from external memory.

  • The "Example Store" in Vertex AI Extensions or the previously mentioned RAG-based Data Store are examples of this approach.
  • It is like a chef having a well-stocked pantry (external data store) in the kitchen, filled with various ingredients and recipes (examples and tools). The chef can dynamically select ingredients and recipes from the pantry to better match the customer's recipe and preferences.

微調學習 (Fine-tuning based learning) / Fine-tuning Based Learning(微調學習)

  • 這種方法涉及在推論之前使用更大的特定範例資料集來訓練模型。
  • 這有助於模型理解何時以及如何應用某些工具,甚至在接收到任何使用者查詢之前。
  • 這就像我們送廚師去學習一種新的菜系或一系列菜系(在更大的特定示例資料集上進行預訓練)。這讓廚師能夠以更深入的理解來應對未來未見過的客戶食譜。

  • This method involves training the model with a larger dataset of specific examples before inference.

  • This helps the model understand when and how to apply certain tools, even before it receives any user query.
  • It is like sending a chef to learn a new cuisine or a series of cuisines (pre-training on a larger dataset of specific examples). This allows the chef to handle future, unseen customer recipes with deeper understanding.

總結來說,這些目標式學習方法各有優缺點,在速度、成本和延遲方面有所不同。透過在 Agent框架中結合這些技術,可以利用各自的優勢並最小化其缺點,從而實現更強大且適應性更強的解決方案。 這些方法讓模型能根據情境,提取資訊、工具、以及範例,提升處理複雜任務的能力。

In summary, these targeted learning methods each have their own pros and cons, differing in speed, cost, and latency. By combining these techniques within an Agent framework, one can leverage their respective strengths while minimizing their weaknesses, thereby achieving a more powerful and adaptable solution. These methods enable the model to extract information, tools, and examples based on context, enhancing its ability to handle complex tasks.

VII. 如何在 Google Vertex AI 實踐 AI Agent / VII. How to Implement AI Agents on Google Vertex AI

Vertex AI 平台提供了一個全託管環境,簡化了建構生產級 AI Agent 的流程,其中包含了先前討論的核心元件,以及額外的工具。以下是關於在 Vertex AI 上建構生產應用程式的重點:

The Vertex AI platform provides a fully managed environment that simplifies the process of building production-grade AI Agents, incorporating the core components discussed earlier as well as additional tools. Below are the key points about building production applications on Vertex AI:

簡化開發流程 / Simplifying the Development Process

開發人員可以使用自然語言介面,快速定義 Agent 的關鍵元素,包括目標、任務指示、工具、用於任務分派的子 Agent 和範例。這讓開發人員能夠更專注於建構和完善 Agent,而不必擔心基礎設施、部署和維護的複雜性。

Developers can use a natural-language interface to quickly define the Agent's key elements, including goals, task instructions, tools, sub-Agents for task dispatch, and examples. This allows developers to focus more on building and refining the Agent without worrying about the complexity of infrastructure, deployment, and maintenance.

整合開發工具 / Integrated Development Tools

Vertex AI 平台提供了一系列的開發工具,用於測試、評估、衡量 Agent 效能、除錯和改善 Agent 的整體品質。這確保了開發出的 Agent 是可靠且高效的。

The Vertex AI platform provides a suite of development tools for testing, evaluating, and measuring Agent performance, debugging, and improving the Agent's overall quality. This ensures that the developed Agent is reliable and efficient.

完整的 Agent 架構 / A Complete Agent Architecture

Vertex AI 平台整合了多種功能,例如 Vertex Agent Builder、Vertex Extensions、Vertex Function Calling 和 Vertex Example Store,這些功能可以共同建構一個完整的端對端 Agent 架構。這個架構能滿足生產應用程式的各種需求。

The Vertex AI platform integrates multiple capabilities, such as Vertex Agent Builder, Vertex Extensions, Vertex Function Calling, and Vertex Example Store, which together can build a complete end-to-end Agent architecture. This architecture can meet the various needs of production applications.

可擴展性和管理 / Scalability and Management

Vertex AI 作為一個全託管平台,處理了基礎設施的管理、部署和維護,讓開發人員可以專注於應用程式的開發和優化。

As a fully managed platform, Vertex AI handles infrastructure management, deployment, and maintenance, allowing developers to focus on developing and optimizing the application.

作者提供了一個在 Vertex AI 平台上建構的 Agent 架構範例,展示了如何利用各種功能來創建生產級應用程式。該架構結合了多個必要的元件,確保了 Agent 的有效運作。

The authors provide an example of an Agent architecture built on the Vertex AI platform, demonstrating how to leverage various capabilities to create production-grade applications. This architecture combines several essential components, ensuring the Agent's effective operation.

Press enter or click to view image in full size

  • 試用:使用者可以從官方文件中嘗試預先建構的 Agent 架構範例。

  • Try It Out: Users can try out the pre-built Agent architecture examples from the official documentation.

VIII. 結論 / VIII. Conclusion

作者總結了生成式 AI Agent 的基礎構成要素、組成方式以及如何以認知架構的形式有效實施它們。以下是作者在總結中提出的幾個重點:

The authors summarize the foundational building blocks of generative AI Agents, how they are composed, and how to effectively implement them in the form of a cognitive architecture. Below are several key points the authors raise in their conclusion:

Agent 透過利用工具來擴展語言模型的能力。Agent 可以存取即時資訊、提出真實世界的行動建議,並自主規劃和執行複雜的任務。Agent 可以利用一個或多個語言模型來決定何時以及如何轉換狀態,並使用外部工具來完成模型本身難以或不可能完成的複雜任務。

Agents extend the capabilities of language models by leveraging tools. Agents can access real-time information, propose real-world action suggestions, and autonomously plan and execute complex tasks. An Agent can use one or more language models to decide when and how to transition states, and use external tools to accomplish complex tasks that are difficult or impossible for the model alone.

Agent 運作的核心是指揮層。指揮層是一種認知架構,它架構了推理、規劃、決策並指導 Agent 的行動。多種推理技術,例如 ReAct、Chain-of-Thought 和 Tree-of-Thoughts,為指揮層提供了一個框架,以接收資訊、執行內部推理並產生明智的決策或回應。

At the core of an Agent's operation is the orchestration layer. The orchestration layer is a cognitive architecture that structures reasoning, planning, and decision-making, and directs the Agent's actions. Various reasoning techniques, such as ReAct, Chain-of-Thought, and Tree-of-Thoughts, provide the orchestration layer with a framework to receive information, perform internal reasoning, and produce informed decisions or responses.

工具 (Tools),例如擴充功能 (Extensions)、函式 (Functions) 和資料存放區 (Data Stores),是 Agent 通往外部世界的鑰匙。它們允許 Agent 與外部系統互動並存取超出其訓練資料範圍的知識。

Tools(工具), such as Extensions(擴充功能), Functions(函式), and Data Stores(資料存放區), are the Agent's key to the external world. They allow the Agent to interact with external systems and access knowledge beyond the scope of its training data.

  • 擴充功能 (Extensions) 提供了 Agent 和外部 API 之間的橋樑,能夠執行 API 呼叫並檢索即時資訊。
  • 函式 (Functions) 透過分工合作為開發者提供了更細緻的控制,允許 Agent 產生函式參數,這些參數可以在客戶端執行。
  • 資料存放區 (Data Stores) 為 Agent 提供了對結構化或非結構化資料的訪問,從而實現了數據驅動的應用程式。

  • Extensions(擴充功能) provide a bridge between the Agent and external APIs, able to execute API calls and retrieve real-time information.

  • Functions(函式) provide developers with more fine-grained control through a division of labor, allowing the Agent to generate function parameters that can be executed on the client side.
  • Data Stores(資料存放區) provide the Agent with access to structured or unstructured data, thereby enabling data-driven applications.

隨著工具變得更加複雜,推理能力得到增強,Agent 將有能力解決日益複雜的問題。此外,「Agent Chain(代理鏈)」的策略方法將繼續獲得發展。透過結合擅長特定領域或任務的專業 Agent,我們可以創建一個「混合 Agent 專家」方法,能夠在各個行業和問題領域提供卓越的成果。

As tools become more sophisticated and reasoning capabilities are enhanced, Agents will be able to solve increasingly complex problems. Moreover, the strategic approach of the "Agent Chain(代理鏈)" will continue to develop. By combining specialized Agents that excel in specific domains or tasks, we can create a "mixture of Agent experts" approach, capable of delivering outstanding results across various industries and problem domains.

構建複雜的 Agent 架構需要迭代的方法。實驗和改進是為特定業務案例和組織需求找到解決方案的關鍵。由於支援其架構的基礎模型的生成特性,沒有兩個 Agent 是完全相同的。然而,透過利用每個基礎元件的優勢,我們可以創建有影響力的應用程式,擴展語言模型的功能並驅動真實世界的價值。

Building complex Agent architectures requires an iterative approach. Experimentation and refinement are key to finding solutions for specific business cases and organizational needs. Due to the generative nature of the foundation models that power their architecture, no two Agents are exactly alike. However, by leveraging the strengths of each foundational component, we can create impactful applications that extend the capabilities of language models and drive real-world value.

I am Simon / I am Simon

大家好,我是 Simon 劉育維,是一位 AI 領域解決方案專家,目前也擔任 Google GenAI 領域開發者專家 (GDE),期待能夠幫助企業導入人工智慧相關技術解決問題。如果這篇文章對您有幫助,請在 Medium 上按一下鼓勵,並追蹤我的個人帳號,這樣您就可以隨時閱讀我所撰寫的文章。歡迎在我的 Linkedin 上留言提供意見,並與我一起討論有關人工智慧的主題,期待能夠對大家有所幫助!

Hello everyone, I am Simon Liu (劉育維), an AI solutions expert, and I currently also serve as a Google GenAI Developer Expert (GDE). I hope to help enterprises adopt AI-related technologies to solve problems. If this article was helpful to you, please give it a clap on Medium and follow my personal account so you can read my articles anytime. Feel free to leave comments on my LinkedIn to share your thoughts and discuss AI topics with me — I hope I can be of help to everyone!

Press enter or click to view image in full size

My Personal Website: https://tinyurl.com/simonliuyuwei


說明:因原文已是繁體中文,我採「中文原文 + 英文翻譯」的對照方向(標題為「中文 / English」)。如果你其實是想翻譯某篇英文文章(例如 Google 原始白皮書英文版),請把英文內容貼給我,我會改成你規格中「英文在前、繁中在後」的格式。文末那串 Medium 連結不在文章範圍內,我未納入處理。