🌐

Two Doors, One Bridge — AgentZero's Wasm Sandbox for Web + Native Plugin Extension

notion image
AgentZero Lite just grew a sandbox where the web's flexibility and the native runtime's power live in the same room — separated by a bridge, not by a wall. This is mission M0005 (harness/missions/M0005).

Why this matters — the value story

Plugins are usually a one-door problem.
  • A native plugin gives you direct access to the runtime — files, audio, GPU, the LLM gateway. But the iteration loop is slow: edit a class, rebuild, restart, retest. UI is whatever the host framework gives you (in our case WPF), so the visual surface costs more than the logic.
  • A web plugin gives you the fastest iteration loop on Earth — change a div, save, reload, done. HTML5/CSS/JS are the most accessible UI stack any developer can hit the ground running with. But the cost is that the web sandbox is too sandboxed: no audio devices, no spawn-a-process, no on-device LLM unless you build a whole HTTP layer to expose them.
A sandbox where the web side and the native side share one bridge flips both tradeoffs in your favour. You get web-speed iteration on the surface and native-power access underneath. That's why the WebDev tab AgentZero just shipped is the highest-leverage spot for future feature extension we've cut so far — every future plugin can choose its mix of web flair and native muscle without negotiating a new architecture each time.

The idea — IZeroBrowser as the contract

The bridge isn't magic. It's a .NET interface that lives in the WPF-free common project:
// Project/ZeroCommon/Browser/IZeroBrowser.cs public interface IZeroBrowser { string GetAppVersion(); VoiceProvidersInfo GetVoiceProviders(); Task<TtsResult> SpeakAsync(string text, CancellationToken ct = default); void StopSpeaking(); } public sealed record VoiceProvidersInfo(string Stt, string Tts, string LlmBackend); public sealed record TtsResult(bool Ok, string? Provider, int Bytes, string? Format, string? Error);
Two design moves to notice:
  1. The contract sits in ZeroCommon, not in the WPF host. ZeroCommon is the shared library that has zero WPF or Win32 dependencies — this is a hard rule in the codebase, enforced because future WASM plugins must be able to compile against this contract without dragging in PresentationFramework. Read the rule in CLAUDE.md.
  1. The DTOs are records, not anonymous objects. A record is a stable shape across versions; if a future .wasm plugin caches the type and the host adds a field, the cache still resolves. Anonymous types die on every recompile.
The implementation lives in the WPF side and reuses the existing voice services — no parallel pipeline, no drift between the WebDev sandbox and the Settings/Voice tab the user already trusts:
// Project/AgentZeroWpf/Services/Browser/WebDevHost.cs public async Task<TtsResult> SpeakAsync(string text, CancellationToken ct = default) { var v = VoiceSettingsStore.Load(); var tts = VoiceRuntimeFactory.BuildTts(v); if (tts is null) return new TtsResult(false, v.TtsProvider, 0, null, "TTS provider is Off"); var bytes = await tts.SynthesizeAsync(text, ResolveVoiceId(v), ct); _playback.Play(bytes, tts.AudioFormat); return new TtsResult(true, tts.ProviderName, bytes.Length, tts.AudioFormat, null); }

The bridge — postMessage RPC over WebView2

Two technical choices carry this design:

1. Virtual host mapping, not embedded resources

A web app is a multi-file beast: HTML pulls CSS, CSS pulls fonts, JS pulls more JS. Embedding each as a managed resource and stitching them together with NavigateToString is workable for one file (Mermaid does it) but painful for an app. Instead, we mount the on-disk folder as a fake host and let the browser resolve relative paths normally:
webDevView.CoreWebView2.SetVirtualHostNameToFolderMapping( "zero.local", Path.Combine(AppContext.BaseDirectory, "Wasm"), CoreWebView2HostResourceAccessKind.Allow); webDevView.CoreWebView2.Navigate("https://zero.local/voice-test/index.html");
https://zero.local/ is intercepted by WebView2 and served from the local Wasm/ folder. No actual network hits the wire. Caching, dev-tools, relative URLs — all behave like a real web server, which means the iteration loop is exactly the loop a web developer expects.

2. JSON RPC over chrome.webview.postMessage

For the bridge itself, two patterns compete in WebView2 land. AddHostObjectToScript lets JS reach into a COM-visible .NET object directly; it sounds elegant but ties the contract to [ComVisible] rules and hurts cross-platform portability. postMessage is the modern, message-passing alternative — async-friendly, debuggable in DevTools, and trivially extensible to host-pushed event streams later.
Wire format:
// JS → host { "id": 7, "op": "tts.speak", "args": { "text": "hello" } } // host → JS { "id": 7, "ok": true, "result": { "ok": true, "provider": "WindowsTTS", "bytes": 36044, "format": "wav" } }
The host-side router parses, dispatches, and replies:
// Project/AgentZeroWpf/Services/Browser/WebDevBridge.cs private async Task<object?> DispatchAsync(string? op, JsonElement? args) { return op switch { "version" => new { version = _host.GetAppVersion() }, "voice.providers" => _host.GetVoiceProviders(), "tts.speak" => await _host.SpeakAsync(ReadString(args, "text")), "tts.stop" => Stop(), _ => throw new InvalidOperationException($"unknown op '{op}'"), }; }
JS-side, a tiny Promise wrapper makes calls feel native:
// Wasm/common/zero-bridge.js window.zero = { invoke, version: () => invoke('version'), voice: { providers: () => invoke('voice.providers'), speak: (t) => invoke('tts.speak', { text: t }), stop: () => invoke('tts.stop'), }, };
Inside any sandbox app, calling native is one line:
const r = await window.zero.voice.speak("Hello from the WebDev sandbox."); // r = { ok: true, provider: "WindowsTTS", bytes: 36044, format: "wav" }

Architecture at a glance

flowchart LR subgraph WPF[AgentZero Lite — WPF host] Settings[Settings panel] WebDev[WebDev tab] Host[WebDevHost : IZeroBrowser] Voice[Voice services<br/>STT / TTS / LlmGateway] Settings --> WebDev Host -. uses .-> Voice end subgraph Sandbox[Wasm/ — local web app] HTML[voice-test/index.html] JS[voice-test.js] Bridge[common/zero-bridge.js] HTML --> JS JS --> Bridge end WebDev -- WebView2<br/>virtual host mapping --> HTML Bridge <-- postMessage<br/>JSON RPC --> Host
The hot path — the bridge — is the only line that crosses the JS/native boundary. Everything else is conventional code in its own world.

The first sandbox — voice-test

The first occupant of Wasm/ is a re-implementation of the existing Settings/Voice "Voice Test" panel as a web UI. It's deliberately small: a textarea, a speak button, a status line, and a bridge log. The point isn't the feature — the point is that the whole feature lives in three flat files under Wasm/voice-test/ and reaches into native .NET TTS through one line of JS.
Project/AgentZeroWpf/Wasm/ ├── README.md ├── common/ │ └── zero-bridge.js ← JS Promise RPC wrapper (~40 lines) └── voice-test/ ← one app, one folder ├── index.html ├── voice-test.css └── voice-test.js
Adding a second app is a mkdir plus three files. No host code change, no XAML edits, no rebuild.

Why the contract lives in ZeroCommon

This deserves its own section because it's the move that makes the rest worth doing.
ZeroCommon is net10.0 — no net10.0-windows, no UseWPF, no System.Windows.*. The headless xUnit suite runs against it without a desktop session. Anything that needs WPF lives one project up in AgentZeroWpf.
Putting IZeroBrowser in ZeroCommon means:
  • A future .wasm plugin (compiled C# → WebAssembly via WASI, or Rust binding through wasmtime) can reference only Agent.Common.dll and still be ABI-compatible with whatever AgentZero exposes.
  • Unit tests can mock IZeroBrowser and exercise the WebDevBridge dispatch logic without spinning up WebView2.
  • If we later swap WebView2 for, say, an embedded Chromium (CefSharp) host, the contract doesn't move — only the bridge does.
The discipline is small now and pays back constantly later. It's the same lesson as keeping actor logic out of ITerminalSession (the seam pattern that already saves us in actor tests).

Roadmap — from JS today to .wasm plugins tomorrow

The folder is named Wasm/ deliberately. The contents today are HTML/JS/CSS — that's fine, and arguably the right starting point because the sandbox earns its keep before any WebAssembly module shows up. But the structure is set so that future plugins can ride the same bridge.
Stage
What lands
Why it's there
Now (M0005)
TTS speak/stop end-to-end through IZeroBrowser
Prove the bridge with a feature the user already trusts
Next
Mic capture + streaming partial transcripts (host-pushed events)
Validate the streaming direction of postMessage
Next+1
Full STT → LLM → TTS loop, all via the bridge
The voice agent stops being a WPF panel and becomes a portable web sandbox
Plugin
.wasm modules in sub-folders, side-loadable
Third parties (or the user) add capabilities without recompiling AgentZero
The "Plugin" stage is the strategic payoff. Today AgentZero's extension points are CLI definitions, terminal sessions, and skill folders — all native, all requiring careful integration. With the Wasm/ sandbox, an extension point becomes "drop a folder, it shows up." That's the kind of leverage that compounds.

Closing

The big idea is simple: two sandboxes, one bridge. The web side gets the fast iteration loop and the universal UI surface. The native side gets to keep its on-device LLM, its NAudio capture, its Akka.NET pipelines. The bridge — IZeroBrowser over postMessage — is the only thing that has to be designed carefully, and it now is.
For us this matters because every future feature is no longer a binary choice between "rebuild the WPF panel" and "stand up an HTTP server." It's a third path, and the third path is the cheapest one to extend.
Mission M0005 is the seed. The sandbox is open. Drop a folder.

Demo

notion image

TECH LINKS