The research behind Parity.

Parity was built on a body of controlled experiments — not assumptions. The research below documents what we learned about compiled execution, governed AI, and why the architecture is built the way it is.

Featured

Opus Research · O-02Preview

When Runtime Inference Fails as an Execution Substrate

For high-stakes operational flows, LLMs are useful upstream reasoning tools but are not reliable runtime execution substrates. Compiled execution satisfies correctness, determinism, temporal fidelity, and governance by construction — not by statistical chance.

5 system classes compared · Compiled execution: 1.000 determinism · Agent systems: 0.499 determinism · 70.8% temporal misbinding in LLM systems
Nesean CroffordRead paper

Opus Research — Compiled Execution

Four experiments studying whether LLMs can serve as reliable execution systems — and what a compiled execution architecture provides that probabilistic inference cannot. O-01 through O-04 form a sequential proof chain: from the foundational negative result to full-pipeline validation.

Opus · O-01Preview

Breaking Confidence in LLMs as Execution Systems

LLMs exhibit multiple orthogonal failure modes — reasoning errors, non-determinism, schema violations, spec-compliance failures — even on tasks within their apparent capability. Compiled execution produces deterministic, exact outputs across all scenarios.

1,260 runs · 4 systems (Claude, GPT, Gemini, Opus) · 4 flow types
Nesean CroffordRead paper
Opus · O-02Preview

When Runtime Inference Fails as an Execution Substrate

For high-stakes operational flows, LLMs are useful upstream reasoning tools but are not reliable runtime execution substrates. Compiled execution satisfies correctness, determinism, temporal fidelity, and governance by construction.

5 system classes compared · 1.000 determinism for compiled execution · 0.499 for agent systems
Nesean CroffordRead paper
Opus · O-03Preview

Compiled Execution at Scale

Compiled execution guarantees are structural, not statistical — they hold identically at 120 flows as they did at 4. 120 simultaneous policy changes analyzed with full causal attribution in under 60 seconds.

120 enterprise flows · 12 domains · Built in 6 days · 1.000 determinism throughout
Nesean CroffordRead paper
Opus · O-04Preview

From Intent to Governance: Full-Pipeline Compiled Execution

The complete pipeline — natural language intent through governed output — works end-to-end on diverse, realistic inputs. Each run produces a 7-link evidence chain from user prompt to governance artifacts.

~100 runs/day over 4–6 weeks · 7-link evidence chain per run · 7 flow categories
Nesean CroffordRead paper

Probity Research — Governed AI Execution

Four experiments studying whether AI agents can operate under enforceable governance — and what the architecture of a purpose-built governance layer looks like. Available as a combined summary.

Probity · P-01 – P-04Summary available

Establishing the Failure Baseline

Every model tested fails at runtime execution under policy-constrained evaluation. Standard testing cannot detect these failures — outcome-based evaluation overstates failure severity by 2.1×. No existing architecture resolves it.

540 runs · 5 frontier models (OpenAI, Anthropic, Google) · 0% pass rate across all models on constrained operational tasks
Nesean CroffordRead paper

Want to go deeper?

The research above informed every architecture decision in Parity. If you have questions about the experimental methodology, results, or how they apply to your organization, reach out.