The research behind Parity.
Parity was built on a body of controlled experiments — not assumptions. The research below documents what we learned about compiled execution, governed AI, and why the architecture is built the way it is.
Featured
When Runtime Inference Fails as an Execution Substrate
For high-stakes operational flows, LLMs are useful upstream reasoning tools but are not reliable runtime execution substrates. Compiled execution satisfies correctness, determinism, temporal fidelity, and governance by construction — not by statistical chance.
Opus Research — Compiled Execution
Four experiments studying whether LLMs can serve as reliable execution systems — and what a compiled execution architecture provides that probabilistic inference cannot. O-01 through O-04 form a sequential proof chain: from the foundational negative result to full-pipeline validation.
Breaking Confidence in LLMs as Execution Systems
LLMs exhibit multiple orthogonal failure modes — reasoning errors, non-determinism, schema violations, spec-compliance failures — even on tasks within their apparent capability. Compiled execution produces deterministic, exact outputs across all scenarios.
When Runtime Inference Fails as an Execution Substrate
For high-stakes operational flows, LLMs are useful upstream reasoning tools but are not reliable runtime execution substrates. Compiled execution satisfies correctness, determinism, temporal fidelity, and governance by construction.
Compiled Execution at Scale
Compiled execution guarantees are structural, not statistical — they hold identically at 120 flows as they did at 4. 120 simultaneous policy changes analyzed with full causal attribution in under 60 seconds.
From Intent to Governance: Full-Pipeline Compiled Execution
The complete pipeline — natural language intent through governed output — works end-to-end on diverse, realistic inputs. Each run produces a 7-link evidence chain from user prompt to governance artifacts.
Probity Research — Governed AI Execution
Four experiments studying whether AI agents can operate under enforceable governance — and what the architecture of a purpose-built governance layer looks like. Available as a combined summary.
Establishing the Failure Baseline
Every model tested fails at runtime execution under policy-constrained evaluation. Standard testing cannot detect these failures — outcome-based evaluation overstates failure severity by 2.1×. No existing architecture resolves it.
Want to go deeper?
The research above informed every architecture decision in Parity. If you have questions about the experimental methodology, results, or how they apply to your organization, reach out.