Series Note — This is Part 2 of two.
Part 1 — Multi-CLI · On-Device LLM · Hybrid Strategy (the big picture) → AgentZero Lite — Bringing Multi-CLI and On-Device LLM to Windows (Part 1, EN)
Part 2 (this article) — The engine room beneath:
Akka.NET actor model, Gemma 4 + GBNF, AgentReactorActor FSM, STT/LLM/TTS three-layer ensemblTwo questions to carry into this article. 1. An LLM is a text-completion engine. So how did it ever become an Agent? 2. Voice / LLM / TTS run as separate models and processes, so how do they appear to respond concurrently like the OpenAI Realtime API? The answer is the same word — Actor model.
1. An LLM is a genius with no office
Over the last year we kept making LLMs smarter. But the moment you try to make one actually do work, things get oddly frustrating. GPT-OSS, Gemma 4, Nemotron Nano Omni — all great, yet none of them, by themselves, can "send a command to a terminal, wait for the result, then decide what to do next."
That's the point. An LLM is a text-completion engine. Feed in a prompt, it emits the next token. It cannot call tools, read files, or remember yesterday's conversation. It's like a genius without an office — no desk, no phone, no colleagues.
Then who is making ChatGPT's
o3 call tools, Claude Code edit files, Gemini CLI run builds? The answer is the runtime around the model. That runtime takes the LLM's output, executes tools, pushes the results back into the LLM's context, and stops when it should.How should you build that runtime — that's where the Actor model enters.
2. The Actor model in 30 minutes — what's different from objects?
The word Actor was coined by Carl Hewitt in 1973. Bottom line: an actor is a tiny computer with a mailbox, talking only via messages, holding its own state alone. It looks like an object but differs in decisive ways.
flowchart LR classDef obj fill:#1e293b,stroke:#06b6d4,color:#e2e8f0 classDef act fill:#312e81,stroke:#a855f7,color:#e9d5ff subgraph object["Object"] direction TB Caller1["Caller"] Caller1 -->|"obj.foo() direct call<br>sync + lock"| Obj["📦 Object<br>(shared state)"]:::obj Obj -.shared var.-> Caller1 end subgraph actor["Actor"] direction TB Caller2["Sender"] Caller2 -->|"actor.Tell(msg)<br>async, returns now"| MB["📬 Mailbox"]:::act MB -->|"one at a time"| Act["🤖 Actor<br>(isolated state)"]:::act Act -.replies via msg only.-> Caller2 end
Object vs Actor — six axes of difference
- Call style — Object:
obj.foo()direct method call. Actor:actor.Tell(msg)message send.
- Synchrony — Object: usually synchronous, caller waits. Actor: async, drop into the mailbox and return immediately.
- Concurrency — Object: hand-managed lock / mutex / volatile. Actor: mailbox processes one at a time — no lock needed.
- State — Object: directly accessible from outside (fields, getters). Actor: lives only inside the actor — outside touches it only via messages.
- Failure handling — Object: try-catch propagating up the call stack. Actor: parent supervises children (Restart / Resume / Stop).
- Location — Object: same process only. Actor: same machine or remote — call sites unchanged.
Compress it to one line — an actor is "a lightweight process with no shared mutable state." That's why it doesn't grab locks, that's why concurrent work doesn't deadlock, that's why one actor's death leaves its neighbors alive.
One more thing — an actor can change how it behaves right now, dynamically. In
Akka.NET one line, Become(state), transforms the actor into another state. That is a Finite State Machine (FSM). The same actor receives different messages while idle vs thinking.These five things (mailbox, message, isolated state, supervision, Become) are all capabilities an AI Agent needs. Quick mapping.
Actor essentials → AI Agent capabilities
- Mailbox (one at a time) → execute exactly one tool safely at a time.
- Message passing (async) → keep LLM inference / tool execution / result feedback in continuous ensemble without blocking.
- State isolation → each agent holds its own KV cache · conversation history · page context.
- Supervision (Restart/Stop) → recover into a fresh context when the model's response breaks.
- Become / FSM → the same agent transitions through chat mode → tool-using mode → wrap-up mode.
- Location transparency → route on-device ↔ cloud ↔ another machine without code change.
This is no coincidence.
Akka.io officially announced "Akka Agents" in late 2025 with the explicit pitch "actors are the natural runtime for stateful AI agents" (Akka Agents). Aaron Stannard sums it up: "There is a natural synergy between the Actor pattern and agentic AI" (Real-time Marketing Automation with Akka.NET). The convergence isn't accidental — both models solve the same problems the same way.3. AgentZero Lite's actual actor topology
Now the code. AgentZero Lite spins up the following tree on top of
Akka.NET.flowchart TB classDef root fill:#1e3a8a,stroke:#3b82f6,color:#dbeafe classDef bot fill:#312e81,stroke:#a855f7,color:#e9d5ff classDef ws fill:#064e3b,stroke:#10b981,color:#a7f3d0 classDef voice fill:#7c2d12,stroke:#f97316,color:#fed7aa Stage["/user/stage<br>StageActor<br>(top supervisor)"]:::root Bot["/user/stage/bot<br>AgentBotActor<br>(Chat / Key / AI)"]:::bot Reactor["/user/stage/bot/reactor<br>AgentReactorActor<br>(AIMODE FSM)"]:::bot Voice["/user/stage/voice<br>VoiceStreamActor"]:::voice STT["STT pool<br>SmallestMailbox"]:::voice TTS["TTS pool"]:::voice WS["/user/stage/ws-{name}<br>WorkspaceActor"]:::ws Term["/ws-*/term-{id}<br>TerminalActor<br>(one ConPTY)"]:::ws Stage --> Bot Bot --> Reactor Stage --> Voice Voice --> STT Voice --> TTS Stage --> WS WS --> Term
Each actor's responsibility in one line.
- StageActor — supervises children's lifecycles. Holds the supervision strategy.
- AgentBotActor — controller for user input. Switches modes Chat ↔ Key ↔ AI via
Become().
- AgentReactorActor — AIMODE inference FSM. Runs exactly one cycle (send→wait→read→done) at a time.
- VoiceStreamActor — owns the Akka.Streams INPUT/OUTPUT graph. Routes STT/TTS worker pools.
- WorkspaceActor / TerminalActor — wraps one real ConPTY terminal session each.
The supervision strategy is interesting. Code from
StageActor (Project/ZeroCommon/Actors/StageActor.cs:132-143).protected override SupervisorStrategy SupervisorStrategy() { return new OneForOneStrategy( maxNrOfRetries: 5, withinTimeRange: TimeSpan.FromMinutes(1), localOnlyDecider: ex => ex switch { ArgumentException => Directive.Resume, // ignore bad messages NullReferenceException => Directive.Restart, // restart on common bugs _ => Directive.Escalate // hand up if unknown }); }
This block, by itself, is the AI Agent's resilience policy. When one agent throws inside a tool call, its sibling agents are unaffected. And which exception class warrants survival vs death is spelled out in code — the opposite of "the AI system dies quietly."
4. Four steps to turn an LLM into an Agent — code walk
The real point. To turn a text-completion engine into an Agent that acts, you need exactly four moves.
flowchart LR classDef step fill:#1e293b,stroke:#06b6d4,color:#e2e8f0 S1["1. Output constraint<br>(GBNF)"]:::step S2["2. Tool execution"]:::step S3["3. Result injection<br>(KV cache)"]:::step S4{{"done ?"}}:::step Out["reply to user"]:::step S1 --> S2 --> S3 --> S4 S4 -- "no, continue" --> S1 S4 -- "yes" --> Out
That loop is the generate → act → observe cycle, and an Agent is whatever turns this loop one full round per turn. Let's see how AgentZero implements each step.
4-1. Force the output to a tool call — GBNF
When an LLM emits free prose, the host can't parse it. Asking "please answer in JSON" via prompt is just prompt engineering, not enforcement at the sampler level. A flaky model breaks it.
GBNF (GGML BNF) blocks this at the sampling stage. At every token, the grammar masks the next-token distribution down to tokens it allows. Free prose becomes impossible at the token level.
flowchart LR classDef raw fill:#1e293b,stroke:#06b6d4,color:#e2e8f0 classDef mask fill:#7c2d12,stroke:#f97316,color:#fed7aa classDef out fill:#064e3b,stroke:#10b981,color:#a7f3d0 LLM["LLM logits<br>(full vocab dist)"]:::raw Mask["GBNF grammar mask<br>(only allowed tokens pass)"]:::mask Sample["Sampler"]:::raw Out["next token<br>(JSON shape guaranteed)"]:::out X["✗ free-prose token<br>(blocked)"]:::mask LLM --> Mask --> Sample --> Out Mask -.block.-> X
The full GBNF in AgentZero lives in
AgentToolGrammar.cs:184-200.root ::= ws "{" ws "\"tool\"" ws ":" ws toolname ws "," ws "\"args\"" ws ":" ws args ws "}" ws toolname ::= "\"list_terminals\"" | "\"read_terminal\"" | "\"send_to_terminal\"" | "\"send_key\"" | "\"wait\"" | "\"done\"" args ::= "{" ws "}" | "{" ws kv (ws "," ws kv)* ws "}" kv ::= string ws ":" ws value value ::= string | integer | boolean
That's the whole grammar. Because Gemma 4 lacks Llama-3.1-style native tool-calling SFT, GBNF is the most reliable way to force Gemma into grammar-clean tool-call shape.
There are exactly six tools.
Six-tool surface
list_terminals— no args → list current workspace's tab catalog.
read_terminal—{group, tab, last_n}→ last N bytes of a tab's scrollback.
send_to_terminal—{group, tab, text}→ send text.
send_key—{group, tab, key}→ control key (cr/lf/esc/tab/ctrlc, etc.).
wait—{seconds: 1..30}→ wait for response.
done—{message}→ end loop + final message to user.
These six cover every scenario for talking to a terminal AI. The smaller the tool surface, the less room for an LLM to get confused.
4-2. Execute the tool — AgentToolLoop's main loop
When the GBNF-backed JSON arrives, parsing it and calling the real tool is
AgentToolLoop.RunAsync (Llm/Tools/AgentToolLoop.cs:66-168).public async Task<AgentToolSession> RunAsync(string userRequest, CancellationToken ct) { for (var iter = 0; iter < _opts.MaxIterations; iter++) // default 12 { ct.ThrowIfCancellationRequested(); // (1) first turn = system prompt + user; later turns inject prior tool_result only var turnInput = (iter == 0) ? FormatFirstTurn(userRequest) : FormatToolResultTurn(turns[^1].ToolResult); // (2) GBNF-forced single JSON var rawJson = await GenerateOneTurnAsync(turnInput, ct); var call = ParseToolCall(rawJson); // (3) done signal → end loop if (call.Tool == "done") return new AgentToolSession(turns, call.Args["message"], true); // (4) execute tool + record turn var toolResult = await ExecuteToolAsync(call, ct); turns.Add(new ToolTurn(call, toolResult)); _opts.OnTurnCompleted?.Invoke(turns[^1]); // ← UI callback } return new AgentToolSession(turns, "max iterations", false); }
Two key details.
MaxIterations = 12. send → wait → read = 3 calls, so about 4 rounds. This arithmetic alone blocks the runaway scenario where an LLM repeats the same tool. Separately,ToolLoopGuardscatches repeated calls in two stages — stage 1 feeds an error back to the model, stage 2 hard-stops.
OnTurnCompletedcallback. Every time a tool runs, it lands asAgentReactorActor.Self.Tell(TurnCompletedInternal(turn))back into the actor. That is, the loop reports progress through the actor's mailbox. The UI hears those messages and renders progress.
4-3. Preserve results in the KV cache
"With many turns, resending the system prompt every time blows up tokens" — exactly. So we create one
LLamaContext from LLamaSharp and keep it alive for the loop instance's lifecycle (AgentToolLoop.cs:45-59).var (weights, modelParams) = llm.GetInternals(); _context = weights.CreateContext(modelParams); // ← KV cache here _executor = new InteractiveExecutor(_context); _grammar = new Grammar(AgentToolGrammar.Gbnf, "root");
The
_isFirstUserSend flag injects the full system prompt only on the very first turn; later turns inject only the previous tool result. The KV cache holds everything in between. So one loop run = one cycle, and the memory between cycles is the KV cache's job.4-4. ONE CYCLE PER RUN — the philosophy that blocks runaway
A single line gets repeated insistently in AgentZero's system prompt (
AgentToolGrammar.cs:97-101).CRITICAL principle — ONE CYCLE PER RUN, BUT DO THE CYCLE. Each tool chain run = ONE complete round trip with the terminal AI:send_to_terminal → wait → read_terminal → react → done. Subsequent cycles are triggered by the user OR an arriving peer signal. The KV cache preserves history across runs.
Translation — don't ask the LLM to script a five-turn debate as one giant tool chain. Each turn ends after one cycle and stops. The next cycle is triggered by the user pressing again, or a peer sending a signal.
Why this rule? If you stuff N cycles into one run — when the LLM misreads a response or starts repeating the same tool, runaway happens inside one run. Cap at one cycle and the start of the next run becomes a natural handle. The user can step in, another peer can signal, or the system can simply just stop.
That's the actor model's essence — "one message = one unit of work." The mailbox is the best tool we have to keep an LLM from emitting tokens forever.
5. AgentReactorActor — the LLM that becomes an FSM
AgentToolLoop alone can't reconcile async-progressing inference with the UI. While tokens stream you need to draw progress, and when the user hits Cancel you need to stop instantly. That's AgentReactorActor's job.This actor is a two-state FSM (
Actors/AgentReactorActor.cs:50-226).stateDiagram-v2 [*] --> Idle Idle --> Running: StartReactor(userRequest)<br>BecomeRunning() state Running { [*] --> Thinking Thinking --> Generating: prefill ends Generating --> Acting: tool_call arrives<br>OnTurnCompleted Acting --> Generating: tool_result injected } Running --> Idle: RunCompletedInternal<br>BecomeIdle() Running --> Idle: RunFailedInternal<br>BecomeIdle() Running --> Running: CancelReactor<br>(_cts.Cancel)
- Idle: waits for a
StartReactor(userRequest)message. On receipt, firesTask.Run(() => loop.RunAsync(...)).PipeTo(Self)to launch the loop async, thenBecomeRunning().
- Running: the loop drops internal messages on the actor via
OnGenerationProgress/OnTurnCompletedcallbacks. The actor relays them to its parent asReactorProgress(Phase, Round, Tokens, ToolCall). The parent draws to the UI.
- End:
RunCompletedInternal(session)arrives →BecomeIdle()automatic return.
- Cancel:
CancelReactormessage →_cts.Cancel()→loop.RunAsyncexits viaOperationCanceledException→RunFailedInternal→BecomeIdle().
The point: inference itself runs in a separate Task, but progress and termination signals all pass through the actor's mailbox. UI code just listens to actor messages — it doesn't hold the token stream itself.
That single line
Task.Run(...).PipeTo(Self) matters a lot. Plain C# await ties to the caller's SynchronizationContext — inside an actor, that blocks the mailbox. PipeTo turns the Task's result into a message and drops it into the actor's mailbox. That gives complete async separation — the actor can keep receiving other messages (e.g. Cancel) at the same time.6. Why STT × LLM × TTS look like they respond at the same time
Now the second question — how do you make "respond while listening" work with three completely separate models, the way OpenAI Realtime API does?
OpenAI Realtime API runs audio in/out + function calls round-robin on a single stateful WebSocket. That's possible because the model is a unified speech model.
AgentZero uses three completely separate free models.
- STT —
Whisper.net(~466 MB GGML, offline)
- LLM — Gemma 4 (LLamaSharp, on-device)
- TTS — OpenAI tts-1 / Windows SAPI (plumbing in, output staged)
The secret to making the three play together as one system is actor + Akka.Streams.
6-1. VoiceStreamActor owns Akka.Streams graphs
flowchart TB classDef in fill:#064e3b,stroke:#10b981,color:#a7f3d0 classDef act fill:#312e81,stroke:#a855f7,color:#e9d5ff classDef out fill:#7c2d12,stroke:#f97316,color:#fed7aa Mic["🎤 Mic<br>NAudio MicFrame"]:::in Q["Source.Queue<br>(DropHead)"]:::in VAD["VoiceSegmenter<br>(VAD split)"]:::in STT["SttWorkerActor<br>SmallestMailboxPool"]:::in AB["AgentBotActor<br>(AIMODE active)"]:::act R["AgentReactorActor<br>(LLM inference)"]:::act Tools["Terminal Actors<br>(tool execution)"]:::act TQ["token Source.Queue"]:::out Chunk["SentenceChunker"]:::out TTS["TtsWorkerActor pool"]:::out Spk["🔊 Speaker"]:::out Mic --> Q --> VAD --> STT STT -->|"VoiceTranscriptReady"| AB AB -->|"StartReactor"| R R -->|"send_to_terminal"| Tools R -->|"final tokens"| TQ TQ --> Chunk --> TTS --> Spk
VoiceStreamActor materializes two Akka.Streams graphs at startup (Actors/VoiceStreamActor.cs:159-172).// INPUT graph — MicFrame → VAD → STT → transcript var materialized = Source.Queue<MicFrame>(cmd.MicBufferSize, OverflowStrategy.DropHead) .Via(VoiceSegmenterFlow.Create(vadCfg)) // VAD + segmentation .Async() .SelectAsync(parallelism, async (PcmSegment seg) => { var reply = await sttPool.Ask<TranscribeReply>( // ← delegate to STT pool new TranscribeRequest(seg, language), TimeSpan.FromSeconds(120)); return new VoiceTranscriptReady(reply.Transcript, reply.DurationSeconds); }) .Where(t => !string.IsNullOrWhiteSpace(t.Transcript)) .ToMaterialized(sink, Keep.Left) .Run(_materializer);
That graph does the heavy lifting.
Source.Queue<MicFrame>accepts audio frames from the mic fire-and-forget (the mic thread never blocks).
VoiceSegmenterFlowuses VAD to cut out only the speaking region into PCM segments.
SelectAsync(parallelism, ...)Asks the STT worker pool per segment async. Workers process N in parallel.
- The
STT poolusesSmallestMailboxPoolrouting to send work to the least-busy worker (Voice/Streams/SttWorkerActor.cs:78-80).
- The result returns as a
VoiceTranscriptReadymessage back toVoiceStreamActor's mailbox.
- The
Sink.ActorRefWithAckprotocol — every message must be Acked withVoiceFrameAckbefore the next segment enters. That's backpressure.
6-2. STT → LLM → TTS message sequence
The whole sequence from a person's perspective.
sequenceDiagram autonumber actor U as 👤 User participant Mic as 🎤 participant V as VoiceStreamActor participant ST as STT pool participant B as AgentBotActor participant R as AgentReactorActor participant T as Terminal participant TT as TTS pool participant Sp as 🔊 U->>Mic: "summarize today's PRs" Mic->>V: MicFrame stream V->>ST: TranscribeRequest (VAD segment) ST-->>V: TranscribeReply V->>B: VoiceTranscriptReady("today PRs ...") B->>R: StartReactor("...") R->>T: send_to_terminal(tab=1, "summarize PRs") R->>R: wait(5) R->>T: read_terminal(tab=1) T-->>R: "PRs summary..." R->>R: done(message) R->>V: SpeakResponse(token stream) [P3] V->>TT: SynthesizeRequest (sentence chunks) TT-->>V: audio chunks V->>Sp: playback
At every step nothing blocks. The mic thread sees only the mic. STT workers only transcribe. The Reactor only watches the LLM. TTS workers only synthesize. Everything connects via messages, mailboxes provide backpressure.
6-3. Compared to Realtime API — same outcome, different infra
flowchart LR classDef api fill:#1e3a8a,stroke:#3b82f6,color:#dbeafe classDef ens fill:#312e81,stroke:#a855f7,color:#e9d5ff subgraph realtime["OpenAI Realtime API"] direction TB WS["1 stateful WebSocket"]:::api Omni["unified omni model"]:::api WS <-->|"audio in/out + tool calls<br>round-robin"| Omni end subgraph actor1["AgentZero actor ensemble"] direction TB A1["STT actor<br>📬 mailbox"]:::ens A2["LLM actor<br>📬 mailbox"]:::ens A3["TTS actor<br>📬 mailbox"]:::ens A1 -.message.-> A2 -.message.-> A3 end
OpenAI Realtime API vs AgentZero actor ensemble — seven axes
- Transport — Realtime: 1 stateful WebSocket. AgentZero: actor messages + Akka.Streams.
- Model — Realtime: 1 unified speech model (gpt-realtime). AgentZero: separate STT/LLM/TTS three.
- Function call — Realtime: WebSocket events (round-robin). AgentZero:
OnTurnCompleted→ actor message.
- Cost model — Realtime: token + audio-minute billing (vendor). AgentZero: on-device free + optional OpenAI TTS.
- Barge-in — Realtime: server VAD turn detection. AgentZero:
BargeInmessage → cancel OUTPUT graph.
- Extension — Realtime: add new events on the same socket. AgentZero: add new actor + message.
- Failure isolation — Realtime: if the socket drops, everything drops. AgentZero: one worker dies, others stay alive.
Same user experience ("speak and it listens and responds") — completely different infrastructure. What Realtime API solves with one chunk of model, the actor solves with small composable modules.
The trade-offs are clear.
- Realtime API strength — small latency. One model that also sees speech meaning.
- Actor ensemble strength — combination freedom. You pick the best Korean-strong STT model, the best reasoning LLM, the most natural TTS separately and plug them in. And when one vendor changes its policy, the rest stays alive. That's the infrastructure answer to the vendor-lock-free point in Part 1.
7. peer-signal bidirectional channel — why actors were born for remote
One last piece — the mechanism we mentioned in Part 1, where a peer terminal calls back to the bot directly, in code.
Scenario: AIMODE sent "summarize" to the Claude tab and is now polling
wait + read. But Claude wants to actively signal AgentBot the moment its own work is done. We'd like the next cycle to trigger immediately, before wait ends.Here's how — the Claude tab runs this one line.
AgentZeroLite.exe -cli bot-chat "DONE(summary done)" --from Claude
The path that line takes into the actor system (
CliHandler.cs:679-742 + MainWindow.xaml.cs:413-419, 796-819).sequenceDiagram autonumber participant CT as Claude tab (external process) participant CLI as CliHandler.BotChat() participant MW as MainWindow participant AB as AgentBotActor participant R as AgentReactorActor CT->>CLI: bot-chat "DONE(summary done)" --from Claude CLI->>MW: WM_COPYDATA(0x414C "AL") MW->>AB: ActorSelection("/user/stage/bot")<br>.Tell(TerminalSentToBot("Claude", "summary done")) AB->>AB: check _activeConversations AB->>R: StartReactor("[from Claude] summary done") Note over R: even mid-wait, fires a fresh run<br>(not waking wait inside same run)
This is the actor model showing why it was born for remote. A signal sent by an external process (Claude CLI) — not a direct method call on an in-process object — drops as a message into a mailbox. And the actor doesn't have to know where it came from.
WM_COPYDATA, gRPC, or Akka.Cluster — the same actor code receives.The AgentZeroRemote / AgentZeroCluster roadmap from Part 1 makes natural sense here.
Akka.Tell works the same whether it's same process or another machine, with practically no code change. Systems built on actors grow naturally into distributed — moving a single-machine IDE into a multi-machine AI assistant cluster looks more like configuration change than rewrite.8. So — Akka becomes AI Agent's runtime, the hypothesis
If you've followed this far, one thing should be visible. Almost everything that makes an AI Agent different from regular code, Actors already have.
AI Agent's hard parts → Actor's existing answers
- LLM responses can break → Supervision (Restart / Resume / Stop).
- The same tool can repeat infinitely → mailbox = one at a time + Become for state transitions.
- Tools finish async → Tell + PipeTo, mailbox is the queue.
- Many agents work concurrently → an actor is essentially a unit of concurrency.
- Signals come from user / peer / self anywhere → message = source-agnostic.
- Want to host the same agent remotely → Location transparency.
That's the same direction
Akka.io is pursuing with its "Akka Agents" line and NVIDIA is going with WASM-based agent sandbox standardization. All converge on the same conclusion — the runtime around AI models, more than the models themselves, is the real infrastructure.So what is AgentZero Lite actually doing?
On top of
Akka.NET, mount one LLM (Gemma 4 + GBNF) and stick STT/TTS/terminal actors next to it as neighbors — that builds our own little Realtime API inside one desktop. Token cost: 0. Code: all open.This is a small preview of the real AI infrastructure of the next five years.
9. Closing — two invitations
- Junior developers. The AI Agent design instinct of someone who has hand-coded an actor once vs someone who hasn't is starting to diverge. For C#, the
Akka.NETofficial tutorial is a one-day course. The pattern of throwing a message, sealing state, and recovering via supervision once is worth a semester of regular OOP.
- Senior engineers. Take a look at the runtime layer of the AI Agent your company is building. If it's a giant
while Trueinside a single function, it's worth one hour of prototyping to see how actor + FSM + supervision simplifies that code. AgentZero Lite'sAgentReactorActor.csis a good starting point.
Appendix — references
- AgentZero Lite — <https://github.com/psmon/AgentZeroLite>
Project/ZeroCommon/Actors/StageActor.cs— actor supervisionProject/ZeroCommon/Llm/Tools/AgentToolGrammar.cs— GBNF + system promptProject/ZeroCommon/Llm/Tools/AgentToolLoop.cs— main loop + KV cacheProject/ZeroCommon/Actors/AgentReactorActor.cs— FSMProject/ZeroCommon/Actors/VoiceStreamActor.cs— STT/TTS Akka.Streams
- Akka Agents announcement — Lightbend/Akka's official AI agent runtime