CodeZero2Pi
HermesAI AgentsTechnical

The Hermes Architecture: Building Agents That Learn

Prasanjit Dey4 min read

The Core Problem Hermes Solves

Most agent frameworks are stateless. You send a task, the agent runs it, you get an output. The next time you send the same task, the agent has no memory of what worked before. It starts from scratch every time.

This is fine for simple, well-defined tasks. It's a fundamental limitation for complex, real-world work where the same general problem appears in slightly different forms and where the right approach changes with context.

Hermes was built around a different assumption: an agent that remembers what worked is more useful than one that doesn't. The entire architecture flows from this premise.

Procedural Memory vs Semantic Memory

When people talk about AI memory, they usually mean semantic memory — facts, context, conversation history. Hermes cares primarily about procedural memory: not what the agent knows, but what it knows how to do.

The distinction matters. Semantic memory degrades as it grows — long context windows get noisy, retrieval becomes imprecise. Procedural memory compounds — each new skill makes the agent more capable, and skills compose.

A Hermes skill is a discrete, versioned unit of behavior:

{
  "id": "skill:debug-typescript-import-error",
  "version": "1.3.0",
  "trigger": "typescript import resolution failure",
  "steps": [...],
  "validated": true,
  "success_rate": 0.94
}

Skills are not prompts. They're structured procedures with defined inputs, outputs, and validation criteria. They're tested before deployment and tracked after.

The Learning Loop

Hermes closes the feedback loop in four phases:

Phase 1: Execution Recording

Every task run produces a structured execution record — inputs, outputs, intermediate states, wall-clock time, and an outcome label (success, partial success, failure). This is not ad-hoc logging. It's a first-class data model that every worker writes to.

Phase 2: Pattern Extraction

A background worker scans execution records for patterns. The signal it looks for:

  • The same error appearing across multiple runs
  • A success path that consistently works for a class of inputs
  • A slowdown that correlates with a specific input shape

When a pattern crosses a confidence threshold, it becomes a candidate for skill synthesis.

Phase 3: Skill Synthesis

The synthesis worker takes a pattern — a cluster of execution records with a shared signature — and generates a candidate skill. This uses an LLM to draft the procedure, but the draft goes through a structured validation harness:

  • Unit tests against the recorded inputs
  • Regression check against all prior successful runs for that pattern class
  • Dry-run in a sandboxed worker environment

Only skills that pass validation get merged into the skill library.

Phase 4: Skill Application

When a new task arrives, the task router checks the skill library for matches before spawning a general reasoning agent. A skill match means the task goes through the tested procedure, not an open-ended LLM conversation. This is faster, more reliable, and cheaper.

"The goal is not to make the LLM smarter. It's to make the system so that the LLM is called as rarely as possible — and when it is, it's solving genuinely novel problems."

Worker Architecture

Hermes runs multiple specialized workers in parallel:

  • Domain workers (frontend, backend, data, QA) execute tasks and write execution records
  • Evolution worker runs the learning loop: pattern extraction and skill synthesis
  • Router dispatches incoming tasks to the right worker, checking skill matches first
  • Validation harness runs automated tests on candidate skills

Workers communicate through a task queue (Redis-backed, with dead letter queues). Each worker is stateless — the state lives in execution records and the skill library.

What This Enables

The compounding effect is the point. A fresh Hermes instance handles tasks through general reasoning — slow and variable. After 30 days of production traffic, a meaningful fraction of tasks hit skill matches — fast and reliable. After 90 days, the system's effective capability in its domain exceeds what any single LLM call could achieve.

This is not magic. It's the same principle that makes experienced engineers faster than new ones: accumulated, organized experience that transfers to new situations. Hermes makes that principle operational for software.

Current Limitations

The architecture has constraints worth naming:

  • Skill synthesis requires enough examples to extract a reliable pattern — low-volume edge cases don't get skills
  • The validation harness catches most regressions but not all; skills in production are still monitored
  • The system optimizes for the tasks it has seen; genuinely novel problems still require full reasoning

These are engineering problems, not fundamental limits. The direction is toward more sophisticated pattern matching, better validation, and tighter feedback loops between production monitoring and skill evolution.

The architecture is a foundation. The interesting work is building on top of it.

Share:Share on X