Skip to content

Extending HilbertBench

HilbertBench is built to be extended in three places: new analyzers, new framework integrations, and new trace-schema fields. Each has a safe path and a few rules that keep an extension from quietly breaking the guarantees the framework rests on. The full contributor checklist is in CONTRIBUTING.md; this page is the architectural how-to.

Adding an analyzer

An analyzer is the easiest thing to add, because it only reads.

A new analyzer is a plain function: it takes a trace (or a run-directory path) and returns a dictionary. It lives in hilbertbench/analysis/, and it must follow the same contract as the existing six:

  • Read-only. It never writes to the trace (INV-002). Take a HilbertTrace, resolve what you need, compute, return.
  • Evidence in, dictionary out. Return a status string plus the quantitative evidence behind it. Do not invent a verdict the numbers do not support.
  • Quantify uncertainty. If you estimate a statistic, attach a bootstrap confidence interval (see analysis/_util.py for the shared helper) and let the verdict report low confidence when the interval straddles a threshold.
  • Degrade, don't guess. If the trace lacks what you need (an old trace with no calibration snapshot, say), return an "insufficient data" status — never a fabricated default (INV-008).
  • Document the threshold. Any decision cutoff is a named constant with a comment explaining where the number came from. Expose it as a function argument so callers can override it.

Add a test in tests/analysis/ that plants a known condition in a constructed trace and asserts the verdict, then regenerate the test catalog.

Adding a framework integration

An integration is a transparent proxy. It is the only place a heavy quantum library may be imported, and it carries the heaviest responsibility, because it runs during the user's experiment.

  • Parity is sacred. The proxy must not change the number of shots, executions, parameter bindings, or observables (INV-001). Forward the call to the real backend, wait for the result the user would have gotten anyway, and copy it aside. Never re-execute.
  • Record after, not instead. Run the real call first; record from its result. Recording must never sit on the critical path in a way that could alter timing or fail the user's job.
  • Failures are visible. Wrap recording so that a problem writing the trace cannot break the user's run, but is itself logged as an ERROR event, not silently dropped (INV-007).
  • Import locally. Keep the framework import inside the integration module so the core stays dependency-free (INV-004), and add a clear message if an optional dependency is missing.

The existing Qiskit and PennyLane proxies are the templates to copy.

Adding a trace-schema field

This is the one with the strictest rules, because the trace format is a long-lived contract.

  • Edit the schema, never the models. Change the JSON Schema in schemas/, then regenerate the Python models (INV-003). See the schema guide.
  • Respect the version freeze. Once a schema version is tagged it is frozen forever (INV-005). A new field goes into a new version directory (v1.1/), not into a released one.
  • New fields are optional and degrade gracefully. Anything not present since the first release must be optional and nullable, so a reader of an older trace resolves it to None (INV-008).
  • Evidence only. The new field records what happened, never an interpretation of it (INV-006). Fields like is_converged or quality_score do not belong in a trace.

The rule behind the rules

Every extension rule traces back to one question: does this change keep the trace trustworthy? If an addition could let the recorder perturb an experiment, let a trace be edited after the fact, or let an analyzer invent data, it is rejected — regardless of how useful it seems. The compliance suite exists to catch the cases where good intentions would have broken a guarantee.