local-first · open · yours

The boulder
doesn't roll back.

Sisyfus runs bounded, auditable AI research loops on your own machine. Each session finishes one concrete task and compacts the evidence to files — so the next push starts higher, not from the bottom.

Run it locally → See what it does

one research loop, made durable

Run → verify → grade → compact to memory → the next session reads compact state, not raw transcript sludge.

The thesis

Agents as workers in bounded loops — not one giant chat.

Most agent runs drown in their own transcript. Sisyfus treats every run as one concrete task that ends by distilling its useful evidence into project memory. Context stays small; learnings stay durable; nothing has to be re-derived.

One session, one task. Finish it, compact it, move on.
Never trust self-certification. An independent grader scores the artifacts, not the worker's confidence.
Memory has a lifecycle. Failure note → investigation → verified fact → reusable rule.
Files are the database. Everything is auditable JSON/Markdown under .sisyfus/ — zero third-party services.

Capabilities

What it actually does.

Not a chat wrapper. A control plane for long-running, branching, scheduled, multi-machine research that you can watch and steer.

01 / research

Autonomous goal loops & bounded beam search

Hand it a goal; it explores, verifies against deterministic checks, grades the result, and revises. For open problems it fans into a bounded beam — many directions, fixed width/depth/budget — and each branch is itself one compacted session.

02 / judgment

Outcomes grading, with a real second model

An independent rubric grader inspects the produced artifacts. Because the grader is just another backend, you can have Claude do the research and Codex grade it — cross-model review with no shared bias.

claudecodexmix per role

03 / memory

Experiment ledger & memory FSM

Research becomes a ledger of experiments — kept / discarded / crashed, structural vs scalar — not a transcript. Durable learnings climb a lifecycle from failure note to consulted rule.

04 / over time

Trackers — scheduled research with temporal diffs

Re-run the same signal checklist hourly or daily. Sisyfus diffs consecutive snapshots: which signals flipped, appeared, or moved — so you watch a thesis evolve, not just a single answer.

05 / at scale

Fleet — your machines, one console

Manage research agents across machines. Workers dial home; the hub aggregates live state and dispatches commands. Your GPU box, trading server, and laptop in one view.

06 / observe

An observable dashboard, one command

sisyfus up brings up the hub, attaches this machine, and opens a live console: what's running now, beam trees, per-round traces, conclusions you mark correct or wrong — and that human verdict is injected into future sessions.

The loop

Two loops, one discipline.

An inner loop iterates against verification and a rubric. An outer loop compacts each finished session into memory the next one reads first.

inner

Explore

Read compact memory, propose a minimal plan.

→

inner

Verify

Deterministic commands & monitors decide — not the worker.

→

inner

Grade

Independent rubric scores the artifacts.

→

outer

Compact

Distill facts, failures, next steps into .sisyfus/.

Where it runs

This site is the front door. The engine runs on your machine.

Sisyfus executes real CLIs and reads your files — it's local-first by design, not a hosted multi-tenant SaaS. sisyfus.ai introduces the project and points you to it; the control plane stays yours.

sisyfus.ai · this website

The front door

What you're reading now — the introduction, the docs, the way in.

Explains the project & the loop
Links to install & source
Static, public, safe to share

your machine · the control plane

Where the work happens

The hub, the agents, your research artifacts — on hardware you control.

Runs Codex / Claude as backends
Reads & writes your local files
Reach it remotely over your own VPN (e.g. Tailscale) if you want — still your box

Get started

Up and watching in two commands.

Pure Python standard library — no third-party runtime dependencies. Bring your own agent CLI (Codex, Claude, or both).

# install python -m pip install -e . # one command: hub + dashboard + this machine as a worker sisyfus up --daemon \ --adapter command \ --agent-command 'claude -p --model {model} < {prompt_path}' # mix backends in one run — Claude researches, Codex grades agents.explorer.command = "claude -p ... < {prompt_path}" agents.grader.command = "codex exec ... < {prompt_path}" # then open the dashboard, write a goal, press run.

The boulderdoesn't roll back.

one research loop, made durable

Agents as workers in bounded loops — not one giant chat.

What it actually does.

Autonomous goal loops & bounded beam search

Outcomes grading, with a real second model

Experiment ledger & memory FSM

Trackers — scheduled research with temporal diffs

Fleet — your machines, one console

An observable dashboard, one command

Two loops, one discipline.

Explore

Verify

Grade

Compact

This site is the front door. The engine runs on your machine.

The front door

Where the work happens

Up and watching in two commands.

The boulder
doesn't roll back.