Meet agentic-pi: building optionality back into Last Light

A few weeks ago I wrote about Last Light and made the point that the harness is the product - the artefact worth caring about is the production line, not the code that comes out of it. I am going to stand by that claim, but I want to follow it up with the bit I glossed over: what is actually doing the coding inside the harness.

Last Light’s job is to plan, route, verify and ship. At every phase - Architect, Executor, Reviewer, the smaller workflows like triage and review and health - it needs to spawn one coding agent, hand it a prompt, parse the JSONL stream back out, and exit. Until last week the thing that did that was Claude Code, driven through @anthropic-ai/claude-agent-sdk. Then it was opencode, forked and rewrapped, for about three days. And now it is a small npm package I wrote called agentic-pi, sitting on top of earendil-works/pi.

Three runtimes in four days is not something I would normally write about. But the reasons each one stopped being the right choice are interesting, and the shape of the thing I have ended up with is - I think - close to what most harnesses are going to need. So this is the story of why, and a step-by-step walkthrough of what agentic-pi actually does.

The first thing that changed: goodbye Claude Code

Until last week Last Light was wired straight into Claude Code via @anthropic-ai/claude-agent-sdk. Each phase spawned a sandbox container, the container ran claude -p against the agent SDK, the agent SDK chatted with Anthropic over the user’s Claude Pro/Max subscription, and out came the JSONL stream. No API key, no per-token billing, just the flat subscription I was already paying for. This was great. The combination of “agent SDK in a container” plus “subscription auth” was the whole reason I could afford to leave Last Light running 24/7 on my own repos.

And then Anthropic quietly removed subscription auth from the headless -p path. Interactive claude in a terminal is still fine on a subscription, but the moment you go headless - which is exactly what a harness does - you need an API key. Per-token billing. No more “Pro plan powers my orchestrator at the weekend.”

I want to be careful here. Anthropic are entitled to draw that line wherever they like, and there is a reasonable argument that automated headless agents are not what a $20-a-month plan was ever designed for. I am not annoyed about it. But it did break one of my assumptions about Last Light - that hobby-scale agentic workflows should be financially viable on a personal subscription rather than a corporate-shaped bill - and it made me notice that the runtime layer of my harness was effectively single-vendor. The claude-agent-sdk only talks to Anthropic. When the auth rules change, you have no second option.

The right answer was always going to be optionality. Let the harness pick the model that fits the task and the budget. Cheap providers for triage and review where context windows matter less, bigger models where they earn their keep, and if Anthropic ever re-opens the subscription path then pick that up too. The harness should not care which provider is on the other end of the call.

The second thing that changed: goodbye opencode

The obvious next step was opencode. It supports multiple providers, it has a run --format json mode that emits exactly the shape of JSONL stream a harness wants, and opencode serve gives you a chat backend you can wire a dashboard to. Over a few days I spiked it on a branch, added OpenRouter as a third provider, dropped the @anthropic-ai/claude-agent-sdk dependency, and merged the lot. That was PR #51, and as of two days ago it was running production.

It ran. It produced answers. And then phases that had clearly completed successfully started being recorded as failed, and the workflow engine refused to move on.

The cause turned out to be a class of bug in opencode run --format json that has nothing to do with opencode being a bad tool and everything to do with what happens when you depend on a terminal event to tell you a run finished. Two specific open issues against opencode bit me in production:

  • sst/opencode#26855 - a race in cmd/run.ts: the run loop observes session.status=idle and exits before draining the final step_finish event to stdout. The model has produced a complete answer, but the terminal accounting line never makes it out of the process. Downstream, the harness sees a clean exit with no step_finish, treats lastReason as undefined, and classifies the phase as failed. Reproduced on a real explore workflow where a one-token “READY” reply from openai/gpt-5.5 came through as text but no step_finish ever followed.
  • sst/opencode#27697 - the JSON formatter drops post-tool-call assistant text, surfacing as step_finish.reason="tool-calls" even though the response actually contains a terminal text completion with no callable tool_use parts. The model is done. Opencode says “the model wanted to call more tools,” the harness says “that is a truncation,” and the phase is classified as failed.

Both manifest as the same thing from where Last Light is sitting: opencode exited cleanly with a final text response, but the terminal accounting event the harness was waiting on never made it through. I patched around it by widening the success classifier to “any clean exit with non-empty finalText and no error events is a success,” but that is exactly the kind of workaround you do not want load-bearing in your harness - you have stopped trusting the runtime’s own signal and started trusting your own heuristics on top of it. And the failures it papers over are not “the agent wrote bad code,” they are “the agent did the right thing and the runtime forgot to tell you.”

That is a tooling problem, not a model problem. The model is fine. The harness is fine. The contract between them is broken in a way I cannot easily fix from the outside, and patching the symptom every time it surfaces is not a strategy.

The shape I actually wanted was:

  • In-process, not subprocess. If the agent runs inside the same Node process as the workflow engine, “did the run complete” is the resolution of a promise, not an event I am hoping reached stdout in the right order.
  • GitHub tools as native tools, not a separate MCP server. The 31 tools I had ported into mcp-github-app could live inside the runtime itself, removing another lifecycle to babysit.
  • Permission profiles enforced at tool-registration time. Not “the LLM tries to call it, the gate refuses, the LLM tries again, every rejection burns tokens.” If a profile cannot use a tool, the tool is not in the system prompt.

That is when I went looking at Pi.

Why Pi

I will not labour this because Pi’s own README does it better than I will, but the short version is: Pi is a deliberately minimal coding-agent harness from earendil-works. It exposes an SDK (createAgentSession, session.subscribe, session.prompt, session.getSessionStats), a multi-provider LLM API via pi-ai, an extension model for registering custom tools, and four run modes (interactive, RPC, JSON, one-shot). No MCP. No plugin store. No opinions about how you orchestrate it.

That last bit is the thing that mattered to me. Opencode is a finished product with a UI and a workflow. Pi is a substrate. If I want a one-shot worker that emits a specific JSONL shape, registers GitHub tools the way Last Light expects, applies a permission profile at registration time rather than runtime, and runs inside a sandbox, none of that is opencode’s problem - it is mine. Pi gives me hooks for all of it without telling me how to use them.

That is also exactly the wrong substrate to drop in front of a workflow engine. Last Light expects to call a function and get JSONL back, with a known shape including a usage snapshot, with GitHub tools available under a permission profile, and with sensible defaults for running inside a sandbox container. Pi will happily emit JSONL in --mode json, but it does not emit the exact shape Last Light wants, it does not know about GitHub, and it has no concept of permission profiles. So I wrapped it.

What agentic-pi is

agentic-pi is a pre-configured, opinionated wrapper around Pi that turns it into a one-shot coding-agent worker for workflow systems. The whole CLI is one command:

Terminal window
echo "list open PRs on owner/repo" | agentic-pi run \
--model anthropic/claude-haiku-4-5 \
--profile read \
--no-session

That is the entire surface area for callers. Read the prompt from stdin, run exactly one agent turn (which may contain many tool calls), emit JSONL on stdout, exit when Pi’s agent_end fires. No REPL, no chat loop, no serve mode. If a phase needs follow-ups, the orchestrator spawns a new process.

That constraint is the most important opinion in the project, so it is worth pausing on: one-shot only. The decision about whether to keep going, retry, or hand off to a different role is the harness’s job, not the agent’s. The agent gets to make every reasoning call inside its phase and then it stops. That is what makes Last Light’s cycle reproducible, and the runtime needs to honour it.

The rest of agentic-pi exists to make that one-shot loop actually useful. Let me walk through it in the order it matters.

1. The JSONL event stream

Pi natively emits a JSONL stream in --mode json. agentic-pi uses Pi’s SDK in-process rather than spawning the Pi CLI, subscribes to the same events, and adds three things on top:

  • A leading {"type":"session","version":3,"id":"<uuid>","cwd":"…"} header so downstream consumers have one place to read the run’s identity.
  • sessionId and timestamp injected onto every subsequent event, so a consumer never has to parse the header line separately to correlate.
  • A terminal {"type":"usage_snapshot","stats":{…}} event synthesised from session.getSessionStats() - because Pi’s per-event payloads do not carry token counts or cost, and Last Light needs that for billing and budget tracking.

If your orchestrator wants to know what an agent run cost, the usage snapshot is the single line you parse. A trimmed run looks like this:

{"type":"session","version":3,"id":"…","timestamp":"…","cwd":"…"}
{"type":"sandbox_status","backend":"none","status":{"backend":"none"},}
{"type":"extension_status","extension":"github","status":"configured","profile":"read","toolCount":18,}
{"type":"agent_start",}
{"type":"message_update","assistantMessageEvent":{"type":"text_delta","delta":"…"},}
{"type":"tool_execution_start","toolCallId":"…","toolName":"github_list_pull_requests","args":{},}
{"type":"tool_execution_end","toolCallId":"…","toolName":"github_list_pull_requests","result":{},"isError":false,}
{"type":"agent_end","messages":[],"willRetry":false,}
{"type":"usage_snapshot","stats":{"tokens":{"input":,"output":,"total":},"cost":0.000},}

This is broadly the same shape Last Light’s parser was already consuming from opencode, plus the augmentations. The migration on Last Light’s side meant dropping a couple of opencode-only fields, adding handling for the session / sandbox_status / extension_status / usage_snapshot records, and - the much bigger win - deleting the subprocess wrapper entirely. Agentic-pi exposes a run() library function as well as a CLI, and the onEvent callback hands you each record in the same order the CLI would have printed it. No child_process.spawn, no fd plumbing, no “did the process exit cleanly” guessing. The records arrive as JS objects.

2. GitHub as a first-class native tool surface

Pi explicitly does not support MCP. That is a deliberate choice on Pi’s part - MCP is a moving target, the spec evolves faster than the SDKs, and Pi’s authors would rather have a tight contract for native tools than a permissive one for MCP servers.

That is a problem if you have spent the past month writing an mcp-github-app server with 31 tools and have wired your entire orchestrator to use it. Which I had.

So I ported it. agentic-pi ships a native Pi extension exposing all 31 GitHub tools - clone, push, issues, PRs, reviews, labels, search, the lot. Tools are registered with the github_ prefix to match opencode’s MCP-server-name convention, so prompts that already referenced github_create_pull_request did not need to change. The Octokit wrapper, the retry/backoff, the git credential-store handling - all ported over almost line for line from the old MCP server.

Auth is opinionated: GitHub App credentials preferred, static GITHUB_TOKEN only as a low-trust fallback. JWT-minted installation tokens cached for about 50 minutes with a 5-minute refresh buffer, git credential-store written with mode 600 and a regex-validated token shape. This is the same auth path the old MCP server used - I did not reinvent it, I just moved it inside the runtime so there is no separate MCP server to stand up.

3. Permission profiles as a registration-time gate

This is the part I am most pleased with, and it is the one that took the longest to get right.

The old MCP server had a permission profile at runtime - the LLM could see all 31 tools, but a runtime gate would reject the disallowed ones. The agent kept trying to call them anyway, because the system prompt told it they existed, and every rejection burned tokens and clouded the cycle. It worked, but it leaked.

--profile <name> in agentic-pi picks one of four allowlists:

ProfileTool countWhat it can do
read18Repo/issue/PR reads + search. No mutations.
issues-write24Read + issue/comment/label mutations.
review-write26Read + issues + PR review/comment + create PR.
repo-write31Everything: clone, push, branch, file edits, merge.

The important difference: tools outside the active profile are never registered. The LLM cannot see them in the system prompt and cannot call them. This is a strictly stronger guarantee than a runtime “ask each time” gate, and it makes the agent’s reasoning visibly cleaner - it stops trying to merge PRs in a review-only context because it does not know merge is a thing it has access to.

The extension is also safe by default when credentials are missing or mis-configured. If you pass --profile X but the env vars are not set, the run continues without GitHub tools rather than failing loudly. An extension_status JSONL line reports the outcome programmatically so the orchestrator can log it without parsing stderr.

That last property mattered because Last Light runs a lot of phases that do not need GitHub at all - the Architect, the security scan, anything that only touches .lastlight/ files - and forcing those phases to set up credentials they would never use was friction I wanted to delete.

4. Whatever model Pi can talk to

--model provider/id accepts any model pi-ai knows about - Anthropic, OpenAI, OpenRouter, Ollama, the lot. Credentials come from environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENROUTER_API_KEY) or from Pi’s ~/.pi/agent/auth.json if you have run pi /login interactively. Provider/id mapping is delegated entirely to pi-ai’s getModel() - I do not maintain my own registry, and any new model Pi picks up is one agentic-pi can use the next day.

This is the bit that gets me my optionality back. Last Light’s per-phase config can now say “use Haiku for triage, Opus for the architect, GPT-5 for the reviewer, OpenRouter’s cheaper Haiku for the health workflow” without the harness needing to know anything about how those providers authenticate. And if Anthropic ever re-enables subscription auth on a headless path, pi /login will pick it up and the harness inherits it for free.

--thinking <level> maps directly to Pi’s thinking level (off / minimal / low / medium / high / xhigh), and per-provider effort is handled by Pi.

5. Defaults that match a containerised sandbox

agentic-pi is designed to run inside a container per phase, the way Last Light spawns it. The defaults reflect that:

  • --no-session is what you want in sandboxed runs - session state lives outside the container, so there is nothing to persist inside it.
  • Built-in tools (read, write, edit, bash, grep, find, ls) are enabled by default. Add --no-builtin-tools if you want a GitHub-only agent (useful for triage workflows that should not touch the filesystem).
  • AGENTS.md in the working directory is auto-loaded as the agent’s system prompt - the same convention Pi and opencode share. Drop your workflow’s AGENTS.md into the mounted workspace and the agent picks it up. Last Light does this for every phase, so the Architect sees an AGENTS.md written for architects, and the Executor sees one written for executors.

6. Optional micro-VM sandboxing

This is the bit I am most excited about and have used the least so far - it is genuinely new ground for me, even if the broader idea of QEMU-backed per-run sandboxes is well-trodden elsewhere.

By default Pi’s file and bash tools run on the host. Pass --sandbox gondolin and they get routed through a per-run Gondolin QEMU micro-VM instead. The orchestrator does not need to manage anything - agentic-pi boots the VM, mounts the working directory at /workspace inside it, runs the agent’s read / write / edit / bash tools through it, and tears it down on agent_end.

What this protects against: arbitrary code the agent runs via bash or write executes inside the VM, not on the host. A prompt-injection that gets the agent to rm -rf / only rm’s the guest, which is thrown away seconds later. The host workspace is still mounted in, so legitimate file edits persist - destructive bash against /workspace will still modify host files, the same trade-off chroot and Docker bind-mounts have.

What it does not protect against: GitHub credentials and the LLM API key live in the agentic-pi process outside the VM. The github_* tools run there. A prompt-injection that subverts Pi into calling github_create_issue does not need to escape the VM - the call happens host-side. The VM protects against code execution, not tool misuse. For that you restrict the profile (--profile read).

This is a useful clarification to have written down, because the temptation to think “the agent is in a VM, so it cannot do harm” is exactly the kind of muddy thinking that gets people in trouble. The sandbox is one defence. The profile is another. They protect against different things and you need both.

The latency cost on macOS Apple Silicon, measured from real runs:

OpTime
First VM.create post-boot~13 s (one-time cache warm-up)
Subsequent VM.create< 100 ms
Per-tool overhead~200 ms each
Realistic shell op (ls /etc && uname -a)~2.8 s
vm.close~10 ms

Two-and-a-bit seconds per shell op is non-trivial - if you are doing a long-running Executor phase with dozens of bash calls, it adds up. For most of Last Light’s phases (triage, review, the read-only Architect) it is a price worth paying for the isolation. For the heavy Executor phases I am still leaving sandbox off and relying on Docker as the outer boundary.

How the migration actually went

The end-to-end swap on Last Light was four things:

  • Replace the opencode run --format json … spawn with an in-process run({...}) call into agentic-pi. No subprocess, no fds, no PID to babysit, no terminal event to wait for on stdout. The agent runs inside the same Node process as the workflow engine, and “did the run complete” is the resolution of a promise.
  • Delete the per-phase MCP-server spawn entirely. --profile read / --profile review-write / etc. handles GitHub tool registration inside the agent process. Three fewer processes per phase.
  • Replace opencode serve for the chat side with pi-ai called directly in-process. One pi-ai conversation per chat thread, rehydrated from history on every turn, no long-lived chat server to supervise.
  • Update the per-phase config to use provider/id model strings (anthropic/claude-sonnet-4-6, openai/gpt-5.5, openrouter/anthropic/claude-opus-4-5) and the per-phase reasoning effort to use --thinking.

It took an afternoon. The behavioural difference I noticed within an hour of the swap was the one I cared about most: phases reliably complete. The finalText-on-clean-exit workaround is gone because the runtime tells me directly whether the run finished, in a return value, rather than through an event I am hoping made it out of a subprocess in the right order. The agent stops trying to call tools it does not have, because they are not registered. The cost line is reliable because the usage snapshot is always the last record. None of these are revolutionary - they are just the kind of small frictions that quietly disappear when you stop running things as separate processes and start running them in-process behind a sharper contract.

When to use this (and when not to)

I want to be honest about who agentic-pi is for, because it is not for very many people.

Use it if you have an orchestrator that calls a coding agent once per workflow phase, in a container, and parses a JSONL stream. Use it if you used to call opencode run --format json and want a less-opaque replacement built on a more hackable substrate. Use it if you need GitHub repo operations available to the agent without standing up a separate MCP server.

Do not use it if you want a chat UI or a long-running interactive agent - use Pi directly, its interactive and RPC modes are excellent. Do not use it if you want generic MCP support, because it has none by design. Do not use it if you want a different tool surface (Linear, GitLab, internal APIs) - fork the extensions/github/ directory as a template, because agentic-pi does not load arbitrary external extensions and I do not currently plan to add that.

If your harness looks like Last Light’s - phases, JSONL, per-phase containers, GitHub-shaped tool surface - then this should be a small change with an outsized payoff. If it looks like anything else, Pi itself is probably the right starting point.

What I am taking from this

The honest version is that I should have started here. Pi is the substrate I actually needed - not a finished product, not a tool with opinions baked in, just an SDK, an extension model, and four run modes. Both detours through Claude Code and opencode were really me finding out that finished products make the wrong trade-offs when what you want is to own every knob.

What I have now is provider per phase, thinking level per phase, sandbox on or off per phase, GitHub profile per phase, model per phase. As I keep optimising Last Light for the workflows I actually run - and as I find specific quality problems in specific phases - every one of those is a dial I can turn independently. Claude-agent-sdk had none of those dials. Opencode had some of them and dropped the terminal event. agentic-pi has all of them, and because it is a thin layer over Pi, the surface keeps growing as Pi does.

agentic-pi is on npm and the source is on GitHub.

If you are running a harness of your own and the runtime layer is single-vendor, I would gently suggest having a look at what it would take to break that. And if you end up building something similar - or you have a take on what agentic-pi should do differently - I would love to hear it. You can find me on LinkedIn, or have a look at Last Light and tell me what you would change.