Autonomous agent · one interface

One agent.Every tool.

The Hermes SDK turns a full autonomous agent into a single call. Give it a goal in plain language — it writes and runs code, browses the web, reads images, creates files and charts, and hands back clean results with downloadable artifacts. No orchestration to wire up.

01 What it can do // the agent picks the right tool for the job
[ exec ]

Run code

Writes and executes Python & shell, captures output, and returns any files it produces.

[ web ]

Search the web

Searches and reads live pages, then answers grounded in what it found.

[ browser ]

Drive a browser

Controls a real Chromium — navigate, click, fill forms, screenshot.

[ vision ]

See images

Analyzes and describes images you attach, and reasons over their contents.

[ charts ]

Make charts

Turns data into plots via code (matplotlib & co.) — returned as downloadable artifacts.

[ files ]

Work with files

Reads, writes, patches and searches files within an isolated working directory.

[ delegate ]

Delegate

Spawns sub-agents to split large jobs into parallel, focused tasks.

[ memory ]

Remember

Per-user memory + full-text search over past conversations.

[ skills ]

Plan & skills

Task planning for multi-step work, plus any installed skills it can invoke.

02 The interface // public surface only

One class, HermesAgent. A general run() for any goal, plus capability-shaped shortcuts when you want to be explicit. Every call returns the same Result.

construct
agent = HermesAgent(
    model      = None,        # default model; None → server default
    permission = "allow",     # "allow" | "deny" | callable(tool) → decision
    user       = None,        # default identity for memory + isolation
    timeout    = 600,         # seconds per call
    retries    = 2,           # auto-retry transient model errors (safe — no dup side effects)
)
agent.run(prompt, *, user="default", session_id=None, attachments=[], tools=None, model=None, timeout=None, on_event=None) → Result
The core call. Give a goal in plain language; the agent decides which tools to use. attachments adds images/files for it to read, tools restricts which capabilities it may use, session_id continues a prior conversation, and on_event streams the live tool timeline as it works.
agent.check() → dict health
Lightweight readiness probe — confirms Hermes is reachable and the model answers. Never raises; returns a diagnostic. Use it on startup before serving traffic.
agent.ask(prompt, **opts) → Result reason
Pure reasoning — no tools. Fastest and cheapest path for questions and writing.
agent.code(task, **opts) → Result exec
Writes and runs code to accomplish task; returns program output and any files in result.artifacts.
agent.research(query, **opts) → Result web
Searches and reads the live web, then returns an answer grounded in sources.
agent.browse(task, *, start_url=None, **opts) → Result browser
Drives a real browser to complete task — navigation, clicks, forms, screenshots.
agent.see(image, question, **opts) → Result vision
Analyzes an image (path, URL or bytes) and answers question about it.
agent.visualize(data, **opts) → Result charts
Turns data or a description into a chart/plot image (via code).
agent.resume(session_id, prompt, **opts) → Result session
Continues an existing conversation with full prior context.
agent.capabilities() → list[Capability]
Lists the tools this agent has available, so a UI can show or gate them.
All capability methods are shortcuts over run() — same options, same Result. Reach for them when you want intent to be explicit; reach for run() when you just want the goal done.
Memory is keyed by user — each user gets a persistent, isolated store (conversation history, full-text recall, learned facts) that survives across calls, so the agent learns over time. Pass a stable id per real user. Continuing a specific thread via session_id is best-effort today; durable per-user memory is the reliable path.
03 Quickstart // three lines to a working agent
do_anything.py
from hermes_sdk import HermesAgent

agent = HermesAgent()
res = agent.run(
    "plot a gaussian histogram"
    " and save it as hist.png",
    user="alice",
)

print(res.text)
# → "Done — 10,000 samples, 50 bins…"

for a in res.artifacts:
    open(a.name, "wb").write(a.bytes())
    # → hist.png  (downloadable)
with_an_image.py
# attach an image — the agent sees it
res = agent.see(
    Attachment.image("chart.png"),
    "what trend does this show?",
)

# stream the live tool timeline
agent.run(
    "research EV sales in 2025",
    on_event=lambda e: print(e.title),
)
# → web_search: EV sales 2025
# → web_extract: iea.org/reports…

# continue the conversation
agent.resume(res.session_id,
    "now use 100 bins")
04 Return types // what every call gives back
types
Result
   .text         str            # the final answer (clean — no tool noise)
   .artifacts    list[Artifact] # files the agent produced
   .tool_calls   list[ToolCall] # what it did, in order
   .reasoning    str            # its thinking (optional)
   .session_id   str            # pass to resume() to continue
   .stop_reason  str            # end_turn | max_turns | refusal
   .elapsed      float          # seconds
   .usage        Usage          # token counts

Artifact
   .name str   .mime str   .size int   .bytes() → bytes   .url str?

ToolCall
   .tool str   .title str   .status "running" | "done" | "error"

Attachment
   Attachment.image(src)   Attachment.file(path)   Attachment.bytes(data, mime)

Capability
   .id str   .name str   .summary str