0. 30-second summary
- In March–April 2026, Anthropic and OpenAI raised "harness" as an official topic of discussion about three weeks apart.
- Anthropic — 2026.3.24, "Designing harnesses for long-running applications" (Prithvi Rajasekaran)
- OpenAI — 2026.4.15, Agents SDK 2.0 — long-horizon harness, sandbox, subagents
- MCP, skills, and CLIs are being pushed as standardized and open — but for the harness layer, both vendors are signaling "design it inside our SDK."
- At the same time, infra costs for both PCs and local inference are exploding. Even though Google's TurboQuant (2026.3.25) cuts memory by ~6×, the Jevons Paradox means total demand goes up, not down.
- We have to pick one of two paths — play on top of a vendor-locked harness, or design a hybrid native harness ourselves.
1. Why is everyone suddenly talking about "the harness"?
A "harness" is originally horse tack — gear that converts a horse's raw power into controllable direction. In AI, the term now refers to the entire outer-layer system that turns the LLM (the horse) into a controllable worker.
"Claude Code serves as the agentic harness around Claude: it provides the tools, context management, and execution environment that turn a language model into a capable coding agent."
— Anthropic official docs (Memorizer: How Claude Code Works — Harness, AI Agent, Agentic Loop)
The six core components are:
flowchart LR Model["LLM<br/>(the horse = inference engine)"] subgraph Harness["Harness (the tack)"] Tools["Tools<br/>Bash / Read / Write"] Perm["Permissions &<br/>approval gates"] Sandbox["Sandbox"] Session["Session &<br/>memory"] Context["Context window<br/>management"] Ext["Extensions:<br/>MCP / Skills / Hooks"] end Model --> Harness Harness --> Agent["AI Agent<br/>(judges & acts on its own)"]
Looking at the last two years, the standardization wave moved from model → tools → MCP → skills → CLI. Then in spring 2026, both vendors simultaneously tried to grab the reins back on one specific layer: the harness.
2. Anthropic's harness — "Start simple, add complexity only when needed"
Primary source: <https://www.anthropic.com/engineering/harness-design-long-running-apps> (2026.3.24)
Anthropic published its design principles for harnesses targeting long-running apps (jobs that span hours or days).
2.1 Design philosophy
"Find the simplest solution possible, and only increase complexity when needed."
— Prithvi Rajasekaran, Anthropic Labs
"Every component in a harness encodes an assumption about what the model can't do on its own."
In other words, the harness is code that compensates for the model's weaknesses, and as the model gets stronger, parts of the harness become candidates for deletion. Discarding is part of the design.
2.2 Core pattern — Generator / Evaluator / Planner (GAN-inspired)
flowchart LR User["User<br/>(one-line prompt)"] --> Planner["Planner<br/>(turns prompt into detailed PRD)"] Planner --> Generator["Generator<br/>(actual builder)"] Generator -->|"Sprint Contract<br/>(agreed definition of done)"| Evaluator["Evaluator<br/>(QA via Playwright MCP)"] Evaluator -->|"FAIL"| Generator Evaluator -->|"PASS"| Done["Done"]
"The generator and evaluator negotiated a sprint contract: agreeing on what 'done' looked like for that chunk of work before any code was written."
2.3 Two findings Anthropic emphasized
Finding | Implication |
Models tend to evaluate their own work with "confident praise" | Self-evaluation is unreliable → split out a separate Evaluator agent |
Context "full reset (new session)" beats "compression" | Context-window compression tricks have hit a wall; session splitting is more stable |
2.4 Cost data (real numbers)
Prompt | Single agent | Full harness |
"Create a 2D retro game maker" | 20 min / $9 | 6 hours / $200 (20×, with significantly higher quality) |
Building a DAW (Opus 4.6, improved harness) | — | 3 h 50 min / $124.70 |
3. OpenAI's harness — "long-horizon harness × sandbox × 100+ LLMs"
Primary sources:
- TechCrunch, "OpenAI updates its Agents SDK..." (2026.4.15)
In Agents SDK 2.0, OpenAI shipped:
3.1 Key concepts
Element | Description |
long-horizon harness | Orchestration layer for hour-to-day multi-step jobs. You bring your own compute and storage; the harness provides the coordination layer. |
in-distribution harness | A harness "shaped the way the frontier model was trained" — tools, approvals, and tracing that match the model's training distribution to guarantee performance. |
sandboxing | Run agents inside controlled compute environments. Contains the risk of unsupervised execution. |
subagents / code mode | Codex-style subagents plus a code execution mode. |
100+ LLM support | Non-OpenAI models also run on the same harness. |
Python first | Python ships first; TypeScript comes later. |
"The harness refers to the other components of an agent besides the model it's running on, and an in-distribution harness allows companies to both deploy and test agents running on frontier models."
— TechCrunch (2026.4.15)
3.2 OpenAI's message
OpenAI is bundling AgentKit + Agent Builder + Apps SDK + Agents SDK 2.0 to run the full design → build → deploy → monetize stack inside their own platform (Memorizer: OpenAI DevDay 2025 — ChatGPT's Super-App Strategy).
The pitch:
- MCP is "open" — but the harness is meant to be assembled visually inside OpenAI's workbench.
- In return you get ChatKit / Widget Builder / Connector Registry as one bundle.
4. OpenAI vs Anthropic — harness comparison
Axis | Anthropic Harness | OpenAI Harness (Agents SDK 2.0) |
Announced | 2026.3.24 | 2026.4.15 |
Core concept | "Start simple → add complexity only as needed" | "long-horizon × in-distribution × sandbox" |
Signature pattern | Planner / Generator / Evaluator (3 agents) + Sprint Contract | Sandbox + Subagents + Code Mode + Tracing/Approvals |
Model lock-in | Effectively assumes Claude Opus / Sonnet | "100+ LLM compatible" — but in-distribution guarantee only on OpenAI models |
Core SDK | Claude Agent SDK + Claude Code | Agents SDK + AgentKit + ChatKit |
Visual tooling | None (code-first, Skills/Hooks) | Agent Builder (drag-and-drop canvas) |
Openness | "Open-source the harness inspection" — you can look inside | Some open-source + visual builder is cloud-only |
Target workload | Full-stack coding, experimental long-running apps | Enterprise workflows, multi-model mixes |
Pricing signal | Comfortably bills $124–$200 per task | BYOC (Bring Your Own Compute) — customer pays for compute |
flowchart TB subgraph Anthropic["Anthropic Harness — code-first minimalism"] A1["Claude Agent SDK"] A2["Skills + Hooks"] A3["Sub-agent pattern<br/>Planner → Generator → Evaluator"] A1 --> A2 --> A3 end subgraph OpenAI["OpenAI Harness — full-stack workbench"] O1["Agents SDK 2.0<br/>(long-horizon harness)"] O2["AgentKit / Agent Builder<br/>(visual canvas)"] O3["Sandbox + Subagents<br/>+ Code Mode"] O4["ChatKit / Widget Builder<br/>(distribution channels)"] O1 --> O2 --> O3 --> O4 end
5. Why the harness layer specifically is becoming vendor-locked
Skills, MCP, and CLIs have all been standardized or opened up to some degree. The harness is different. There are two structural reasons.
5.1 The "enterprises will pay for this eventually" learned behavior
flowchart LR GH["GitHub<br/>(Enterprise)"] -->|"ACL/SAML/Audit"| ENT1["Enterprise ISMS requirement"] Slack["Slack<br/>(Enterprise Grid)"] -->|"DLP/eDiscovery"| ENT1 Notion["Notion<br/>(Enterprise)"] -->|"SCIM/audit logs"| ENT1 Claude["Claude Team/Enterprise"] -->|"RBAC/SSO"| ENT1 OpenAI["ChatGPT Business/Enterprise"] -->|"audit logs/usage analytics"| ENT1
Pattern repeated across SaaS history — basic ACL, audit, and SSO always end up gated behind the most expensive plan. The harness is going down the same road:
- Free / MAX = personal coding assistant
- Team / Enterprise = harness governance (audit, tracing, multi-seat, policy gates)
The 90% ChatGPT Pro discount in Korea (Memorizer: ChatGPT Pro 90% Discount Promotion in South Korea (대한민국)) shows this clearly. Even though Sam Altman openly admitted that the $200/mo Pro plan runs at a loss, OpenAI subsidized it 90% in Korea solely to buy a single metric — conversion rate — ahead of an IPO. Call it a prelude to the harness license war.
5.2 OpenAI's "harness = part of the model's training distribution" claim
OpenAI used the phrase in-distribution harness. The implication: the model's training distribution (behavior cloning, RLHF, RLAIF) already encodes the shape of their harness, and you can't extract 100% of the model's performance without it.
If true, this means no matter how standardized MCP becomes, the harness is the layer the model provider can build most efficiently — i.e., the strongest lock-in point.
6. Infra costs are exploding — both cloud and local get more expensive
6.1 LLMs are getting cheaper like air, but "everything around them" is getting more expensive
"AI's enemy is time. Today's Pro-tier inference becomes free-tier in 6 months."
Token prices are dropping. But with the rise of agents, the surrounding infra is getting expensive:
- GPU: for LLM inference
- Memory / SSD / NAND: local PC prices spiking (as users have noted)
- Compute (BYOC): OpenAI explicitly says bring your own compute for the long-horizon harness
6.2 Google's counter — TurboQuant + the Jevons Paradox
Primary source: <https://v.daum.net/v/20260329050205578> (2026.3.29)
On 2026.3.25, Google announced TurboQuant (KV-cache compression + quantized inference).
Metric | Change |
Memory usage | as low as 1/6 of baseline |
Throughput | up to 8× |
Reference: 70B-param LLM serving 512 concurrent users | 512 GB → under 100 GB with TurboQuant |
Naively this looks bad for memory makers — demand should fall. Right after the announcement, foreign net selling of Samsung Electronics hit ₩2.94T and SK Hynix took a market shock too.
But KB Securities analyst Kim Dong-won and Sungkyunkwan University professor Kwon Seok-jun cited the Jevons Paradox for a different read:
"If inference truly gets cheaper, the freed memory gets used for applications that were previously priced out — total memory demand could explode rather than shrink."
flowchart LR A["Efficiency 6×↑<br/>(TurboQuant)"] --> B["Unit cost ↓"] B --> C["Agent workloads<br/>explode"] C --> D["Total memory · GPU · power<br/>demand ↑↑"] D --> E["Local PC parts<br/>+ cloud bills<br/>both rise"]
In short: as the harness era kicks in, LLMs get cheaper, but the agent full-stack bill gets bigger.
6.3 Cloud vs local — both get more expensive at the same time
Category | 2024 | 2026 (now) |
Production workloads | Mostly backend servers | • agent compute (BYOC) |
Dev machine | Code editor + browser | • local agent + Pencil/Playwright + context cache → 64 GB RAM no longer enough; 96–128 GB becoming standard |
SSD / NAND | 1 TB sufficed | 2 TB+ NVMe (vector DB, model weights, session logs) |
GPU | Optional | 24 GB+ VRAM is effectively the entry bar (for local sLM / on-device inference) |
7. So what should we prepare?
7.1 Three scenarios
flowchart TD Choice["Our choice"] Choice --> S1["Scenario A<br/>Play on top of vendor-locked harness"] Choice --> S2["Scenario B<br/>Design a native harness ourselves"] Choice --> S3["Scenario C<br/>Hybrid — partly on-device, partly frontier model"] S1 -->|"Pros"| S1P["Fast start, full-stack tooling provided"] S1 -->|"Cons"| S1C["Hostage to license & pricing tier, model lock-in"] S2 -->|"Pros"| S2P["Full control, multi-vendor possible"] S2 -->|"Cons"| S2C["Engineering cost, learning curve"] S3 -->|"Pros"| S3P["Cost-optimal, risk diversified"] S3 -->|"Cons"| S3C["Vendor harness may not allow this kind of bridging"]
7.2 The trap — "forced upgrade to Team/Enterprise"
Claude's
MAX plan is for personal use. Blocking auth above N devices is the exact same playbook Netflix used to kill household sharing — tolerated at first, blocked once IPO and revenue pressure mounts. If you start using it seriously inside a company, you'll likely be forced to migrate to Claude Team / OpenAI Business.7.3 Recommended internal actions
Action | Why |
1. Build our own harness notation standard | Translate Anthropic's Planner-Generator-Evaluator into our domain vocabulary as an internal harness guide (one page). No matter how vendor SDKs change, the layer naming stays under our control. |
2. Self-host the MCPs we depend on | Self-host memorizer / Atlassian / Notion MCPs — even if a vendor pushes lock-in, the tool layer survives. |
3. Define a BYOC infra catalog | The OpenAI BYOC model may become standard — pre-define internal GPU / VRAM / NVMe baseline specs. |
4. Always separate the Evaluator | Anthropic's key finding — never trust model self-eval. Bake an Evaluator step into every writing / code / planning workflow. |
5. Treat harness tracing as a contract | What does the vendor send back from our data, and what do they train on? Tracing standards will be the next ISMS clause. |
8. Closing — Which side are we betting on?
"Every component in a harness encodes an assumption about what the model can't do on its own."
That one sentence is the whole point. The harness doesn't go away. As the model gets stronger, only its shape changes. Yesterday's required guardrail disappears tomorrow; yesterday's missing governance becomes core tomorrow.
Two questions remain.
- "Should we become native harness engineers?" — Yes. But not at every layer — only at the domain-specific layer (e.g., our domain's Evaluator, our 5-axis evaluation, our governance policies).
- "Is it more rational to just play on top of the vendor harness?" — Partly yes. But keep three escape hatches reserved in advance: MCP, harness tracing, and data governance.
Vendors will eventually nudge us toward more expensive tiers. To have leverage when that day comes, we need to keep one option always alive: "we can throw the harness away."
References
External
- Anthropic — Designing harnesses for long-running applications, Prithvi Rajasekaran, 2026.3.24
- OpenAI — The next evolution of the Agents SDK, 2026.4.15
- OpenAI — Harness engineering (KO)
- TechCrunch — OpenAI updates its Agents SDK to help enterprises build safer, more capable agents, 2026.4.15
- Daum News — TurboQuant and the Jevons Paradox, 2026.3.29
Internal (Memorizer)
- Claude Code — Part 1: Harness, AI Agent, Agentic Loop — 악분, 2026.3.22
- HARNESS — Agent Team & Skill Architect Architecture — Minho Hwang, 2026.3.18
TECH LINKS
- 𝕏 @webnori