Architectural Invariants¶

HilbertBench records scientific evidence, and these eight rules are what make that evidence trustworthy. They are referenced throughout the source as INV-001 … INV-008. A change that breaks one is rejected in review no matter how convenient it is; if an invariant genuinely needs to change, that is a design discussion, not a patch.

Each entry below states the rule, then explains the failure it exists to prevent — because the rule only makes sense once you have seen what goes wrong without it.

INV-001 — Execution Parity¶

The recorder must not change the number of shots, circuit executions, parameter bindings, or observable evaluations the user's code performs.

This is the one that justifies the word "non-intrusive". On superconducting hardware, a circuit's noise depends on when it runs, so a diagnostic that quietly fires an extra job — or pads the shot count "just to be safe" — has changed the very thing it claims to observe, and on paid hardware it has also spent the user's money. The proxy therefore only watches. It forwards each call to the real backend, waits for the result the user would have gotten anyway, and copies it aside. The optimizer cannot tell HilbertBench is attached.

Broken by an integration hook that calls backend.run() outside the user's original call, or that touches the shots argument on the way through.

INV-002 — Trace Immutability¶

A finalized trace is never modified.

A flight recorder you can edit after the crash is worthless. During recording the event log only ever grows; when the HilbertTape context closes, the trace is sealed with a SHA-256 checksum over the whole event stream. Nothing on the read side can write — not the reader, not an analyzer, not a migration. If a trace must be reshaped, that produces a new derived trace and leaves the original untouched.

Checked by trace.verify(), which recomputes the seal and raises IntegrityError if a single byte has moved.

INV-003 — The Schema is the Only Source of Truth¶

The Python models in hilbertbench/models/ are generated from the JSON schemas in schemas/. They are never hand-edited.

A trace is meant to be read years from now by tools written in languages that may not exist yet. If the format lived only in Python, the "specification" would be whatever the code happened to do that week. So the JSON Schema is authoritative and the Python is compiled output, the way an object file is compiled from source. Want to change a field? Edit the schema, run python schemas/scripts/generate_models.py, commit both together.

Broken by a commit that edits hilbertbench/models/v*.py without a matching schema change — the tell-tale sign of a hand-edit.

INV-004 — The Core Imports Only the Standard Library¶

recorder/, reader/, and models/ depend on nothing but the Python standard library and each other.

Recording evidence is too important to be hostage to a heavy, fast-moving quantum stack. Keeping the core dependency-free means it stays small enough to audit by reading, and a researcher can open and verify someone else's traces without installing Qiskit at all. Everything volatile — qiskit, pennylane, pyarrow — lives in integrations/, analysis/, and the storage layer, where it belongs.

INV-005 — Tagged Schema Versions Are Frozen¶

Once a schema version is tagged in Git, no file in that directory ever changes again.

The whole promise of INV-003 collapses if v1.0 can quietly mean something different next month. So a released schema version is permanent. A new or changed field means a new version directory (v1.1/, v2.0/), which guarantees that a trace written today still parses against any future reader that understands its version.

INV-006 — Evidence and Interpretation Stay Separate¶

A span records what physically happened — circuits, parameters, timestamps, outcomes. It records no judgement about what any of it means.

The moment a verdict like "barren plateau detected" is written into a trace, that trace is frozen around one analysis, made with one set of thresholds, by one version of the tool. Keep interpretation out, and the same raw trace can be re-diagnosed tomorrow with a sharper analyzer or a different threshold — which is exactly the workflow the project is built for. Diagnoses are computed on read, by the analysis layer, and never flow back.

Broken by adding fields such as is_converged, error_rate, or quality_score to the trace schema.

INV-007 — Failures Are Always Visible¶

Every span that starts must end in a way you can see: a clean SPAN_END, an explicit ERROR event, or a structurally detectable crash.

The worst outcome for a diagnostic tool is to make a problem disappear. If the user's circuit throws, the adapter catches it, writes an ERROR event into the trace, and re-raises — the failure is both surfaced to the user and preserved as evidence. What the recorder must never do is swallow an exception to keep the trace looking tidy; a tidy trace that hides a failure is a lie.

Broken by try: ... except Exception: pass anywhere in recorder/ or integrations/.

INV-008 — Missing Data Degrades, Never Hallucinates¶

A reader of an older trace must not invent values the trace does not contain.

Absence of evidence is not evidence of absence. If an optional field — say, a calibration snapshot — was never recorded, it resolves to None, and any diagnostic that needed it reports Insufficient Data and stops. What it must not do is substitute a 0 or a False and proceed, because that manufactures a confident answer out of nothing.

Broken by reader logic that fills missing fields with defaults just to keep a diagnostic from short-circuiting.