第 1 期：Agent 核心循环

核心概念：用户消息进入 → LLM 生成 → 如果有 tool_calls 就执行工具并循环 → 否则返回最终文本

概念讲解

想象你在和一个有「手」的助手对话。你问它一个问题，它不只是嘴上回答 —— 它可以动手做事（执行命令、读文件、搜索网页），然后根据做的结果继续想，再做更多事，直到它觉得可以给你一个完整的答案了，才开口。

这就是 Hermes Agent 的核心循环：

用户输入 → [system + history + user] 发给 LLM
                  ↓
         LLM 返回响应
                  ↓
        ┌── 有 tool_calls？──┐
        │                    │
       YES                  NO
        │                    │
  执行每个工具           返回文本给用户
  结果追加到 messages         （循环结束）
        │
        └── 回到"发给 LLM" ──┘

这个 while 循环是 Hermes Agent 的心脏。所有的智能（上下文压缩、重试、fallback、中断）都是围绕这个循环做的「守卫」和「优化」，但循环本身的逻辑极简。

源码关键片段

文件：agent/conversation_loop.py:796 — while 循环入口

python

# 主对话循环入口（简化版）
# 完整版在 agent/conversation_loop.py，约 4752 行
# 这里只展示骨架逻辑

api_call_count = 0           # 已调用 LLM 的次数
final_response = None        # 最终要返回给用户的文本

# 迭代预算：限制单轮对话最多调用 LLM 的次数（默认 90 次）
agent.iteration_budget = IterationBudget(agent.max_iterations)

# 核心循环：只要预算没耗尽，就继续
while (api_call_count < agent.max_iterations               # 硬上限
       and agent.iteration_budget.remaining > 0            # 软预算
      ) or agent._budget_grace_call:                       # 宽限一次机会

    # 检查用户是否发了中断信号（Ctrl+C 或新消息）
    if agent._interrupt_requested:
        break

    api_call_count += 1
    agent.iteration_budget.consume()
    # ... 构建 api_messages，调用 LLM，处理响应 ...

文件：agent/conversation_loop.py:3652 — 检查响应是否包含 tool_calls

python

    # 检查 LLM 响应是否有工具调用
    if assistant_message.tool_calls:
        # 有工具调用 → 验证、执行、把结果追加到 messages
        agent._vprint(f"🔧 Processing {len(assistant_message.tool_calls)} tool call(s)...")

        # 验证工具名是否合法（防止模型幻觉出不存在的工具）
        for tc in assistant_message.tool_calls:
            if tc.function.name not in agent.valid_tool_names:
                repaired = agent._repair_tool_call(tc.function.name)
                if repaired:
                    tc.function.name = repaired

文件：agent/conversation_loop.py:3884 — 执行工具调用

python

        # 把 assistant 消息（含 tool_calls）追加到 messages
        messages.append(assistant_msg)

        # 执行所有 tool_calls（顺序或并发）
        agent._execute_tool_calls(assistant_message, messages, effective_task_id, api_call_count)

        # 执行完成后 → continue → 回到 while 循环顶部，再次调 LLM
        continue

文件：agent/conversation_loop.py:3984-3986 — 无 tool_calls，返回最终响应

python

    else:
        # 没有工具调用 → 这是最终回答
        final_response = assistant_message.content or ""
        # ... 处理 think blocks、空响应恢复等边界情况 ...
        break  # 退出 while 循环

文件：agent/tool_executor.py:849 — 实际调用 handle_function_call

python

# 在 tool_executor.py 中，每个 tool_call 的实际执行：
function_result = _ra().handle_function_call(
    function_name,          # 工具名，如 "terminal"
    function_args,          # 参数字典，如 {"command": "ls -la"}
    effective_task_id,      # 任务隔离 ID
    tool_call_id=tool_call.id,
    session_id=agent.session_id or "",
    enabled_tools=list(agent.valid_tool_names),
    skip_pre_tool_call_hook=True,  # 已在外层检查过
)
# function_result 是 JSON 字符串，追加为 tool role 消息

文件：run_agent.py:4575, 4588 — AIAgent 入口和 run_conversation 转发

python

class AIAgent:
    def __init__(self, ...):
        """约 60 个参数，委托给 agent/agent_init.py"""
        from agent.agent_init import init_agent
        init_agent(self, ...)

    def chat(self, message: str) -> str:
        """简单接口 — 返回最终回答字符串"""
        result = self.run_conversation(message)
        return result.get("final_response", "")

    def run_conversation(self, user_message, system_message=None,
                         conversation_history=None, task_id=None) -> dict:
        """完整接口 — 转发到 agent/conversation_loop.py"""
        from agent.conversation_loop import run_conversation
        return run_conversation(self, user_message, system_message,
                                conversation_history, task_id)

设计决策

Hermes 的选择	为什么这样做	没选的替代方案	替代方案的代价（为什么不选）
循环是同步的（无 async/await）	简化推理和调试；工具执行本身可能很长（等终端命令完成），async 只增加复杂度不增加吞吐	用 asyncio 驱动整个循环	必须处理 event loop 嵌套问题，测试更难写
用 iteration budget 而非简单计数器	budget 可以跨父子 Agent 共享（子 Agent 消耗的也算在父的预算里），避免递归 spawn 导致无限工具调用	每个 Agent 独立计数	子 Agent 可以耗尽所有迭代预算，深度嵌套时难以预测总消耗
工具结果用 tool role 消息追加	OpenAI 格式标准要求；模型能区分"工具说的"和"用户说的"	把工具结果拼进 user 消息	破坏 role alternation，模型分不清谁在说话
响应为空时的多层恢复机制（prior-turn fallback → post-tool nudge → thinking prefill）	弱模型经常在工具执行后返回空内容，直接报错用户体验极差	直接返回错误	浪费额外 1-2 次 API 调用；恢复逻辑复杂，边界情况多
LLM 调用始终走 streaming	即使无展示需求，streaming 也能做 90s 超时检测，避免卡死在无响应连接上	只在有展示回调时才 stream	非 streaming 调用可能被死连接永久挂起

复刻代码说明

本期复刻代码在 dongrealm-hermes-agent (day1 分支) 中迭代，功能说明：一个最小的 Agent 循环 —— 硬编码两个 mock 工具（获取时间、计算器），能跑通「LLM 调用 → 工具执行 → 继续 → 返回文本」的完整流程。

常见陷阱

tool_calls 的 arguments 是 JSON 字符串，不是 dict — OpenAI SDK 返回的 tool_call.function.arguments 是原始 JSON 字符串，需要手动 json.loads()。源码中解决：agent/tool_executor.py:566 做了 json.loads(tool_call.function.arguments)，外加 JSONDecodeError 容错。
空响应不代表出错 — 很多模型（尤其是小模型）在工具执行后会返回空 content。源码中解决：agent/conversation_loop.py:3984-4100 实现了三层恢复（prior-turn content fallback → post-tool nudge → thinking prefill）。
role alternation 必须严格 — OpenAI API 要求 assistant 消息后必须跟 tool 或 user，不能出现 tool → user 等违规序列。源码中解决：agent/conversation_loop.py:932 调用 _repair_message_sequence() 在每次 API 调用前修复序列。
finish_reason="length" 不等于出错 — 表示模型输出被截断（token 上限），需要续写。源码中解决：agent/conversation_loop.py:1583 检测 finish_reason == "length"，注入 continuation prompt 让模型继续。
同一个 while 循环内的 API 调用失败需要重试，而非退出循环 — 源码在 while 循环内部嵌套了一个 while retry_count < max_retries 的重试循环（agent/conversation_loop.py:1157），处理网络错误、限流、模型切换等，和外层的"工具调用迭代循环"是两个不同层次。

第 1 期：Agent 核心循环 ​

概念讲解 ​

源码关键片段 ​

设计决策 ​

复刻代码说明 ​

常见陷阱 ​

第 1 期：Agent 核心循环

概念讲解

源码关键片段

设计决策

复刻代码说明

常见陷阱