AINA Data Engine Room · Handoff · 2026-06-11

Runtime Eval Runs Handoff

Observed deterministic local answers and assertion outcomes for the 1,000-title runtime fixture sample.

Ali Mehdi Mukadam · co-authored with Codex · branch ali/personalization-engine-mission-2026-06-09

The Single Idea

The engine room now has observed local eval runs for the current 1,000-title runtime fixture sample. This moves the system from expected evaluator cases to deterministic local answer text, assertion outcomes, and pass/fail status for every fixture.

01

What Changed

Added src/aina_data_engine/runtime_eval_runs.py, wired aina-data-engine runtime-eval-runs, and extended tests for observed answers, assertion outcomes, blocked refusals, and CLI wiring.

No model calls were made. These are deterministic local fixtures for proving safety boundaries, answer shape, caveat visibility, blocked holds, and basic role/workflow fit.
02

Live Artifacts

ArtifactRowsSHA-256
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_eval_runs_v1.jsonl10003455608747e6eac88e07f1473667fd1ee7f4fa38a41860fda617de7c8d8fb90c
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_eval_runs_v1_packet_quality_eval_runs.jsonl2950c4fda90b6f344c2fb36f8c47b5a34b41025a60a3c10d3c059a6bfef038ceb4e
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_eval_runs_v1_caveat_eval_runs.jsonl6700e97c6c2730a8bed0af00021d2b6b3ac8aaed43058cf5465f599b38ccf0900e9
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_eval_runs_v1_source_ref_rerun_eval_runs.jsonl111a78bab821b3428f19b71aaf07466c03d4b2e83502256f3470a6b0ba6c2269a
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_eval_runs_v1_hold_mining_eval_runs.jsonl34306594d77d3601c569563a08a7e889f9b057652652b5fee1b9d2beccf17622de
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_eval_runs_v1_semantic_followup_eval_runs.jsonl4628f22b6e953c370695f00dea14060b504aa64206c8f5b2b47c62497a386d985c6
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_eval_runs_v1_failing_eval_runs.jsonl0e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
03

Live Result

MetricCount
Eval run rows1000
Local serviceable eval runs965
Packet-quality eval runs295
Caveat eval runs670
Blocked eval runs35
Semantic follow-up eval runs462
Failing eval runs0
Eval statusCount
Passed538
Passed with semantic follow-up427
Passed blocking refusal35
04

Assertion Coverage

AssertionCount
Mentions role title1000
Names workflow or artifact1000
Preserves human judgment boundary1000
Blocks production claim1000
Avoids real-user data1000
References local synthetic scope1000
Displays and keeps caveat visible670
Uses source-backed context295
Is packet-hardening ready295
Refuses runtime plan until repair35
Requires source evidence repair35
05

Spot Check

I inspected 50 actual rows with eval status, fixture lane, display title, function, pass flag, semantic flags, assertion count, and observed answer text.

TitleStatusRuntime behavior
Seasonal Sales AssociatePassedSource-backed packet-hardening answer.
Support Associate - SomaPassedCustomer support answer with local scope.
Director of Business IntelligencePassedData/insight answer with judgment boundary.
Customer Service RepresentativePassedFallback precision caveat visible.
SalespersonPassed with semantic follow-upSales workflow served locally; correction preserved.
Business AnalystPassed with semantic follow-upData/analysis workflow served locally; correction preserved.
family law attorneyPassed blocking refusalRuntime plan refused until source evidence repair.
teacher-special educationPassed blocking refusalRuntime plan refused until source refs are attached.
06

Scope Boundaries

07

Validation

All summary checks are true, including fixture validity, observed answers, assertion results, serviceable row pass, blocked row refusal, caveat visibility, external-domain blocking, and production-claim blocking.

cd /srv/aina/aina-data-engine-room
.venv/bin/python -m ruff check src tests
.venv/bin/python -m pytest -q
All checks passed.
199 passed in 219.78s.
08

Resume Commands

cd /srv/aina/aina-data-engine-room
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-eval-runs
cd /srv/aina/aina-data-engine-room
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room harvest-source-map
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room source-import-recipes
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-harvest-gate --sample-limit 1000
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-repair-queue
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room deterministic-semantic-repairs
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-patch-replay
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-intake
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-payloads
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-evaluator-fixtures
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-eval-runs