Autonomous agent · one interface

One agent.Every tool.

The Hermes SDK turns a full autonomous agent into a single call. Give it a goal in plain language — it writes and runs code, browses the web, reads images, creates files and charts, and hands back clean results with downloadable artifacts. No orchestration to wire up.

Quickstart → Read the interface

01 What it can do // the agent picks the right tool for the job

[ exec ]

Run code

Writes and executes Python & shell, captures output, and returns any files it produces.

[ web ]

Search the web

Searches and reads live pages, then answers grounded in what it found.

[ browser ]

Drive a browser

Controls a real Chromium — navigate, click, fill forms, screenshot.

[ vision ]

See images

Analyzes and describes images you attach, and reasons over their contents.

[ charts ]

Make charts

Turns data into plots via code (matplotlib & co.) — returned as downloadable artifacts.

[ files ]

Work with files

Reads, writes, patches and searches files within an isolated working directory.

[ delegate ]

Delegate

Spawns sub-agents to split large jobs into parallel, focused tasks.

[ memory ]

Remember

Per-user memory + full-text search over past conversations.

[ skills ]

Plan & skills

Task planning for multi-step work, plus any installed skills it can invoke.

02 The interface // public surface only

One class, HermesAgent. A general run() for any goal, plus capability-shaped shortcuts when you want to be explicit. Every call returns the same Result.

construct

agent = HermesAgent(
    model      = None,        # default model; None → server default
    permission = "allow",     # "allow" | "deny" | callable(tool) → decision
    user       = None,        # default identity for memory + isolation
    timeout    = 600,         # seconds per call
    retries    = 2,           # auto-retry transient model errors (safe — no dup side effects)
)

agent.run(prompt, *, user="default", session_id=None, attachments=[], tools=None, model=None, timeout=None, on_event=None) → Result

The core call. Give a goal in plain language; the agent decides which tools to use. attachments adds images/files for it to read, tools restricts which capabilities it may use, session_id continues a prior conversation, and on_event streams the live tool timeline as it works.

agent.check() → dict health

Lightweight readiness probe — confirms Hermes is reachable and the model answers. Never raises; returns a diagnostic. Use it on startup before serving traffic.

agent.ask(prompt, **opts) → Result reason

Pure reasoning — no tools. Fastest and cheapest path for questions and writing.

agent.code(task, **opts) → Result exec

Writes and runs code to accomplish task; returns program output and any files in result.artifacts.

agent.research(query, **opts) → Result web

Searches and reads the live web, then returns an answer grounded in sources.

agent.browse(task, *, start_url=None, **opts) → Result browser

Drives a real browser to complete task — navigation, clicks, forms, screenshots.

agent.see(image, question, **opts) → Result vision

Analyzes an image (path, URL or bytes) and answers question about it.

agent.visualize(data, **opts) → Result charts

Turns data or a description into a chart/plot image (via code).

agent.resume(session_id, prompt, **opts) → Result session

Continues an existing conversation with full prior context.

agent.capabilities() → list[Capability]

Lists the tools this agent has available, so a UI can show or gate them.

All capability methods are shortcuts over run() — same options, same Result. Reach for them when you want intent to be explicit; reach for run() when you just want the goal done.

Memory is keyed by user — each user gets a persistent, isolated store (conversation history, full-text recall, learned facts) that survives across calls, so the agent learns over time. Pass a stable id per real user. Continuing a specific thread via session_id is best-effort today; durable per-user memory is the reliable path.

03 Quickstart // three lines to a working agent

do_anything.py

from hermes_sdk import HermesAgent

agent = HermesAgent()
res = agent.run(
    "plot a gaussian histogram"
    " and save it as hist.png",
    user="alice",
)

print(res.text)
# → "Done — 10,000 samples, 50 bins…"

for a in res.artifacts:
    open(a.name, "wb").write(a.bytes())
    # → hist.png  (downloadable)

with_an_image.py

# attach an image — the agent sees it
res = agent.see(
    Attachment.image("chart.png"),
    "what trend does this show?",
)

# stream the live tool timeline
agent.run(
    "research EV sales in 2025",
    on_event=lambda e: print(e.title),
)
# → web_search: EV sales 2025
# → web_extract: iea.org/reports…

# continue the conversation
agent.resume(res.session_id,
    "now use 100 bins")

04 Return types // what every call gives back

types

Result
   .text         str            # the final answer (clean — no tool noise)
   .artifacts    list[Artifact] # files the agent produced
   .tool_calls   list[ToolCall] # what it did, in order
   .reasoning    str            # its thinking (optional)
   .session_id   str            # pass to resume() to continue
   .stop_reason  str            # end_turn | max_turns | refusal
   .elapsed      float          # seconds
   .usage        Usage          # token counts

Artifact
   .name str   .mime str   .size int   .bytes() → bytes   .url str?

ToolCall
   .tool str   .title str   .status "running" | "done" | "error"

Attachment
   Attachment.image(src)   Attachment.file(path)   Attachment.bytes(data, mime)

Capability
   .id str   .name str   .summary str