Engineering · March 15, 2026 · 6 min

Why we built scenario generation on Claude Opus.

Picking a frontier model is a real architectural decision, not a vibe. Here is exactly why we run scenario orchestration on Claude Opus 4.6 and where we deliberately don't use it.

By Eryk Czekalski← All posts

The decision

We orchestrate scenario generation on Claude Opus 4.6. We don't use it for in-session dialog, we don't use it for biometric interpretation, and we don't use it as a coach. It writes the scene brief — the structural document that describes who appears in the VR session, what they want, how the conflict escalates, and what success looks like.

Why not GPT-class or open-weight?

Three reasons.

Extended thinking. A scenario brief is not a single-shot generation task. It involves consulting the participant's PTI history, cross-referencing prior scenarios to avoid repetition, and stress-testing the conflict structure. Opus' extended thinking gives us multi-step planning inside a single call. That collapses what would otherwise be a five-step prompt chain into one auditable artifact.

Long context. Each participant accumulates PTI deltas, session notes, and operator observations over twelve sessions. By session ten that context window is dense. Opus handles it cleanly. We tested smaller-context models early and watched them lose track of session-five subtleties by session seven.

Quality at the edges. Generic scenarios are useless. We need scenarios that are subtle — a boss who is wrong but charming, a board member who is right but hostile, a peer who is testing you without admitting it. The output quality at the edges of the distribution is what separates Opus from cheaper models, and the edges are where identity work happens.

Where we deliberately don't use it

In-session dialog. Every NPC in the headset is voiced by a local Gemma instance. That decision is non-negotiable for three reasons: latency (under 500ms is required for immersion), privacy (audio cannot leave the room), and cost (per-token API pricing breaks the business model at scale). Gemma loses to Opus on nuance — we accept the trade because real-time identity work cannot tolerate cloud latency.

Biometric interpretation. The PTI is a statistical instrument, not an LLM call. We do not ask a language model to look at an HRV trace and tell us what it means. That is a category error.

Coaching. We will not use LLMs to give participants advice. The reflection conversations are with humans — operators and consulting psychologists — because identity work requires a person on the other end of the table.

What this implies architecturally

The protocol is the IP. The models are commodity. When Claude Opus 5 ships, we re-prompt. When a better-than-Gemma open-weight model ships, we swap it. The interfaces are stable; the engines behind them rotate. Lock-in is not the architecture we want.

This is the most boring possible answer to "which AI do you use?" — which is precisely why we trust it.

Get Early Access

Want the next post in your inbox first?

The waitlist also gets the methodology updates that don't go on the public blog. No spam, no SaaS funnel.

Join the Waitlist All posts