AINA Data Engine Room · AIN-510 · 2026-06-13

Sensitive Guardrail Eval Runs Handoff

Local evaluator proof for sensitive top-band runtime candidates, without unlocking runtime, embeddings, batch, or production authority.

Ali Mehdi Mukadam · co-authored with Codex · local VDS checkpoint

The Single Idea

This slice executed the 22 sensitive guardrail fixture rows locally and proved that every evaluator assertion passes before any sensitive runtime bridge is allowed. Passing rows are only candidates for a later caveated fallback-bridge contract.

01 · What changed

Eval Runs Became A First-Class Receipt

The new top_band_sensitive_guardrail_eval_runs_v1 lane consumes the fixture pack, executes each local mismatch fixture with deterministic assertion checks, carries forward blocked rows unchanged, and marks passing rows as bridge_candidate_after_eval only.

Input22 fixture rows from the sensitive guardrail pack.
ExecuteDeterministic local assertions by risk bucket.
RecordPassed, failed, and blocked-carryover JSONL receipts.
HoldNo bridge, embedding, batch, or production unlock.

The CLI now exposes top-band-sensitive-guardrail-eval-runs, and validate checks receipt existence, JSONL row counts, fixture-pack parity, assertion pass status, and locked runtime boundaries.

02 · Live corpus result

All 22 Executable Rows Passed

22
Passed eval runs
285
Passed assertions
0
Runtime bridge approvals
3
Blocked carryover rows
0
Embedding approvals
0
Production unlocks
BucketEval rows
people_hr_sensitive15
legal_compliance14
education_minors8
finance_regulatory7
public_sector3
customer_data2
healthcare_privacy2

The three blocked rows remain case manager, teacher special education, and paralegal corporate documentation. They stay out of bridge and embedding paths until their specific context, confirmation, or repair conditions are satisfied.

03 · Validation passed

The Same Gate Standard Stayed Green

uv run pytest tests/test_top_band_sensitive_guardrail_eval_runs.py tests/test_top_band_sensitive_guardrail_fixture_pack.py tests/test_top_band_sensitive_source_authority_triage.py tests/test_production_runtime_contracts.py -q
uv run ruff check src/aina_data_engine/top_band_sensitive_guardrail_eval_runs.py src/aina_data_engine/cli.py src/aina_data_engine/reports.py tests/test_top_band_sensitive_guardrail_eval_runs.py
uv run aina-data-engine --root /srv/aina/aina-data-engine-room top-band-sensitive-guardrail-eval-runs
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness

Focused pytest returned 12 passed, Ruff returned All checks passed, validation passed, AIN-506 passed, AIN-510 returned promotion_ready, and production runtime readiness returned ready_to_harden_headless_production_runtime.

AIN-510 still shows 6510 valid Gemini vector rows, complete top-500 and top-1,000 vector coverage, a known-pair mean cosine gap of 0.190463, and zero failed promotion or snapshot checks.

04 · Boundaries preserved

No Authority Was Accidentally Promoted

This slice made the sensitive lane more testable without making it more permissive. It made no live Gemini API call, created no embedding or batch manifest, emitted no runtime bridge, unlocked no public runtime, used no real-user data, wrote nothing externally, enabled no production telemetry, promoted no runtime embedding authority, and mutated no donor repo.

05 · Artifacts added

The Proof Is Replayable

The added code lives in /srv/aina/aina-data-engine-room/src/aina_data_engine/top_band_sensitive_guardrail_eval_runs.py, with focused coverage in /srv/aina/aina-data-engine-room/tests/test_top_band_sensitive_guardrail_eval_runs.py. The durable receipts live under artifacts/validation/top_band_sensitive_guardrail_eval_runs_v1*, and the report pair lives under artifacts/reports/top_band_sensitive_guardrail_eval_runs_v1.*.

06 · Next best move

Bridge Only After Eval Proof

The next production-quality slice is to build the caveated sensitive fallback bridge contract from the 22 passing eval rows. It should carry source refs, risk buckets, required caveats, and evaluator assertion proof into the bridge; keep the three blocked rows out; and continue to block embedding until the bridge contract and AIN-510 gates prove no sensitive mismatch regression.

cd /srv/aina/aina-data-engine-room
git status --short --branch
uv run aina-data-engine --root /srv/aina/aina-data-engine-room top-band-sensitive-guardrail-fixture-pack
uv run aina-data-engine --root /srv/aina/aina-data-engine-room top-band-sensitive-guardrail-eval-runs
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
Where to start

Start with the passing eval-run receipt, then build the smallest caveated bridge that preserves every source, caveat, and locked boundary already proven here.