Runtime Evaluator Fixtures Handoff
Local evaluator cases for packet hardening, caveat service, source reruns, and hold mining.
The engine room now has local evaluator fixtures for the current 1,000-title runtime payload sample. This converts packet-hardening and caveat-service payloads into concrete test cases a tutor, learner agent, and evaluator can exercise locally, while keeping all real-user, external-write, and production claims blocked.
What Changed
Added src/aina_data_engine/runtime_evaluator_fixtures.py, wired aina-data-engine runtime-evaluator-fixtures, extended tests, and improved the sidecar function resolver in runtime_payloads.py.
Healthcare, finance/lending, frontline operations, field service, construction, housekeeping, material handling, and retail sales roles now get more realistic local workflow prompts.
Live Artifacts
| Artifact | Rows | SHA-256 |
|---|---|---|
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_evaluator_fixtures_v1.jsonl | 1000 | 8fecf3b95084519ecbd022fede387980c4e680e7198362acb3d530ec9ee812be |
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_evaluator_fixtures_v1_packet_quality_fixtures.jsonl | 295 | 7432a551b3b7505535539d9e0c5d37e1a50ca93e3e2f59845c3a4d82d591bab4 |
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_evaluator_fixtures_v1_caveat_evaluator_fixtures.jsonl | 670 | 380df385e460a9fac1abf9052fe2728bc92a1b3f4e6bcc1c7bd7306108029415 |
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_evaluator_fixtures_v1_source_ref_rerun_fixtures.jsonl | 1 | 41397d96fd19aa64996ca82b0c648301828d71522de8ef3b7e2b86ea628821a0 |
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_evaluator_fixtures_v1_hold_mining_fixtures.jsonl | 34 | 4cb6bfbde6d6a023f72338044f20af399c1b2df03bc89a151828d202f2b44412 |
/srv/aina/aina-data-engine-room/artifacts/validation/runtime_evaluator_fixtures_v1_semantic_anomalies.jsonl | 462 | 3f28997c8e18f3892473094aef6806110f0cd6a9f333e491550fb08a03917575 |
runtime_payloads_v1.jsonl now has SHA-256 856dde3aa8130906eeb29ca57ac4b2034e448ade7475951adaf46e9a13d504d9.Live Result
| Fixture lane | Count |
|---|---|
| Packet quality fixtures | 295 |
| Caveat evaluator fixtures | 670 |
| Source-ref rerun fixtures | 1 |
| Hold-mining fixtures | 34 |
| Locally serviceable evaluator fixtures | 965 |
| Runtime function | Serviceable count |
|---|---|
| General business | 261 |
| Sales | 139 |
| Operations | 111 |
| Administration | 78 |
| Finance | 73 |
| Customer success | 68 |
| Data analytics | 53 |
| Healthcare | 48 |
| Legal/compliance | 38 |
| Marketing | 32 |
| People/HR | 22 |
| Design/creative | 19 |
| Product | 9 |
| Strategy consulting | 8 |
| Education | 6 |
Anomaly Queue
| Flag | Count |
|---|---|
| Serviceable general-business context still broad | 261 |
| Function changed for runtime | 171 |
| Missing source refs | 35 |
| Not source backed | 35 |
| Non-runtime fixture blocked | 35 |
| Hold not runtime allowed | 34 |
| Source-ref rerun required | 1 |
| Unknown source function defaulted | 1 |
Semantic Spot Check
I inspected 50 actual rows with fixture lane, display title, source function, runtime function, local runtime flag, expected action, semantic flags, and artifact under test.
| Title | Lane | Runtime function | Result |
|---|---|---|---|
| Seasonal Sales Associate | Packet quality | sales | Source-backed local packet-quality fixture. |
| Support Associate - Soma | Packet quality | customer_success | Source-backed support/customer fixture. |
| Customer Service Assistant | Packet quality | customer_success | Corrected from administration for runtime use. |
| Salesperson | Caveat evaluator | sales | Corrected from general business. |
| Business Analyst | Packet quality | data_analytics | Corrected from sales to analysis/reporting. |
| Patient Care Technician | Caveat evaluator | healthcare | Healthcare-specific, caveat-visible fixture. |
| Phlebotomist | Caveat evaluator | healthcare | Healthcare-specific, caveat-visible fixture. |
| Mortgage Loan Officer | Caveat evaluator | finance | Finance/lending-specific fixture. |
| Housekeeper | Caveat evaluator | operations | Operations/process fixture. |
| Family Law Attorney | Hold mining | legal_compliance | Not runtime-allowed; needs source mining. |
Validation
All fixture validation checks are true: payload validity, count preservation, lane split, evaluator assertions, caveat requirements, non-runtime blocking, removed human reviewer gate, and blocked production claims.
cd /srv/aina/aina-data-engine-room
.venv/bin/python -m ruff check src tests
.venv/bin/python -m pytest -q
All checks passed.
198 passed in 192.55s.
Resume Commands
cd /srv/aina/aina-data-engine-room
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-evaluator-fixtures
cd /srv/aina/aina-data-engine-room
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room harvest-source-map
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room source-import-recipes
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-harvest-gate --sample-limit 1000
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-repair-queue
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room deterministic-semantic-repairs
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-patch-replay
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-intake
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-payloads
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room runtime-evaluator-fixtures
Recommended Next Slices
- Turn the 965 local evaluator fixtures into deterministic answer/eval runs.
- Mine or specialize the 261 serviceable-but-broad general-business rows.
- Recover source references for the 35 missing-source-ref rows and rerun the chain.
- Expand the sample beyond 1,000 rows using the same path.
- Build a compact founder dashboard from the summary JSON files.