The boulder
doesn't roll back.
Sisyfus runs bounded, auditable AI research loops on your own machine. Each session finishes one concrete task and compacts the evidence to files — so the next push starts higher, not from the bottom.
one research loop, made durable
Run → verify → grade → compact to memory → the next session reads compact state, not raw transcript sludge.
Agents as workers in bounded loops — not one giant chat.
Most agent runs drown in their own transcript. Sisyfus treats every run as one concrete task that ends by distilling its useful evidence into project memory. Context stays small; learnings stay durable; nothing has to be re-derived.
- One session, one task. Finish it, compact it, move on.
- Never trust self-certification. An independent grader scores the artifacts, not the worker's confidence.
- Memory has a lifecycle. Failure note → investigation → verified fact → reusable rule.
- Files are the database. Everything is auditable JSON/Markdown under
.sisyfus/— zero third-party services.
What it actually does.
Not a chat wrapper. A control plane for long-running, branching, scheduled, multi-machine research that you can watch and steer.
Autonomous goal loops & bounded beam search
Hand it a goal; it explores, verifies against deterministic checks, grades the result, and revises. For open problems it fans into a bounded beam — many directions, fixed width/depth/budget — and each branch is itself one compacted session.
Outcomes grading, with a real second model
An independent rubric grader inspects the produced artifacts. Because the grader is just another backend, you can have Claude do the research and Codex grade it — cross-model review with no shared bias.
claudecodexmix per roleExperiment ledger & memory FSM
Research becomes a ledger of experiments — kept / discarded / crashed, structural vs scalar — not a transcript. Durable learnings climb a lifecycle from failure note to consulted rule.
Trackers — scheduled research with temporal diffs
Re-run the same signal checklist hourly or daily. Sisyfus diffs consecutive snapshots: which signals flipped, appeared, or moved — so you watch a thesis evolve, not just a single answer.
Fleet — your machines, one console
Manage research agents across machines. Workers dial home; the hub aggregates live state and dispatches commands. Your GPU box, trading server, and laptop in one view.
An observable dashboard, one command
sisyfus up brings up the hub, attaches this machine, and opens a live console: what's running now, beam trees, per-round traces, conclusions you mark correct or wrong — and that human verdict is injected into future sessions.
Two loops, one discipline.
An inner loop iterates against verification and a rubric. An outer loop compacts each finished session into memory the next one reads first.
Explore
Read compact memory, propose a minimal plan.
Verify
Deterministic commands & monitors decide — not the worker.
Grade
Independent rubric scores the artifacts.
Compact
Distill facts, failures, next steps into .sisyfus/.
This site is the front door. The engine runs on your machine.
Sisyfus executes real CLIs and reads your files — it's local-first by design, not a hosted multi-tenant SaaS. sisyfus.ai introduces the project and points you to it; the control plane stays yours.
The front door
What you're reading now — the introduction, the docs, the way in.
- Explains the project & the loop
- Links to install & source
- Static, public, safe to share
Where the work happens
The hub, the agents, your research artifacts — on hardware you control.
- Runs Codex / Claude as backends
- Reads & writes your local files
- Reach it remotely over your own VPN (e.g. Tailscale) if you want — still your box
Up and watching in two commands.
Pure Python standard library — no third-party runtime dependencies. Bring your own agent CLI (Codex, Claude, or both).