Runtime Title Specialization Handoff
A local checkpoint that turns broad ICP title coverage into role-native runtime payloads with provenance and evaluator follow-up.
This slice moved the local runtime engine from broad title coverage toward role-native serviceability. The title universe still has the same 1,000 runtime rows and 965 locally serviceable rows, but the vague general_business serviceable bucket dropped from 261 rows at the start of the slice to 12 rows after deterministic specialization.
- 01What changed
- 02Before and after
- 03Current coverage
- 04Semantic sanity check
- 05Artifact inventory
- 06Validation
- 07What this means
- 08Next best slice
What Changed
The runtime payload builder now has an ordered deterministic resolver in src/aina_data_engine/runtime_payloads.py. It uses title text to assign sidecar runtime functions when upstream function data is missing, generic, or plainly wrong, while preserving the original source_function and a function_resolution record for every change.
The resolver now covers sales, finance, healthcare, technology, retail operations, hospitality/travel, supply chain, manufacturing, quality/safety, facilities, real estate, community services, personal services, engineering/hardware, legal/compliance, public safety, research/science, leadership strategy, people/HR, customer success, marketing, product, data/analytics, design/creative, education, operations, administration, and strategy consulting.
The generated learner and tutor payloads also gained workflow language for the new functions, so titles like Store Manager, System Administrator, Primary Care Physician, Teller, Executive Chef, Police Officer, and Postdoctoral Researcher no longer receive generic business exercises.
Before And After
| Metric | Before | After |
|---|---|---|
| Runtime rows | 1,000 | 1,000 |
| Locally serviceable rows | 965 | 965 |
| Packet-hardening rows | 295 | 295 |
| Caveat-service rows | 670 | 670 |
| Blocked/non-runtime rows | 35 | 35 |
Serviceable general_business rows | 261 | 12 |
| Function changes flagged for semantic follow-up | 171 | 484 |
| Semantic follow-up rows | 462 | 526 |
| Failing eval rows | 0 | 0 |
Current Function Coverage
| Runtime function | Rows | Runtime function | Rows |
|---|---|---|---|
| sales | 140 | operations | 91 |
| finance | 88 | healthcare | 81 |
| administration | 55 | customer_success | 54 |
| retail_operations | 48 | data_analytics | 45 |
| legal_compliance | 38 | technology | 35 |
| marketing | 34 | hospitality | 26 |
| supply_chain | 26 | manufacturing | 25 |
| people_hr | 25 | engineering_hardware | 22 |
| facilities | 21 | design_creative | 19 |
| quality_safety | 17 | general_business | 12 |
| real_estate | 11 | leadership_strategy | 10 |
| product | 9 | personal_services | 8 |
| education | 7 | community_services | 5 |
| research_science | 5 | strategy_consulting | 5 |
| public_safety | 3 |
Semantic Sanity Check
I inspected 50 changed runtime rows with these columns: title, source function, resolved function, deterministic reason, and first module goal. The sample looked semantically coherent from learner, tutor, platform, and evaluator perspectives.
| Title | Runtime function | Runtime artifact |
|---|---|---|
| Store Manager | retail_operations | store operations brief, shift plan, or merchandising checklist |
| Primary Care Physician | healthcare | intake summary, shift note, or safety checklist |
| System Administrator | technology | technical runbook, troubleshooting note, or implementation plan |
| Executive Chef | hospitality | shift brief, service-recovery note, or prep checklist |
| Teller | finance | variance note, forecast assumption, or control checklist |
| Case Manager | community_services | case summary, referral note, or follow-up plan |
| Police Officer | public_safety | incident note, patrol brief, or escalation checklist |
| Postdoctoral Researcher | research_science | experiment note, literature brief, or findings summary |
The remaining 12 general_business serviceable rows are deliberately broad: Entry Level Professionals, Associate, Intern, Community Manager, Team Lead, Referee, Summer Intern, Immediate Entry Level Opportunity No Experience Needed, Experience Management-Senior Manager, Cognitive Performance Specialist, Manager, and Attendant.
Artifact Inventory
Generated runtime artifacts live under /srv/aina/aina-data-engine-room/artifacts/validation/. The artifacts/ directory is ignored by default, so this checkpoint explicitly preserves the runtime v1 outputs with git add -f.
| Artifact | Rows | Bytes | SHA-256 |
|---|---|---|---|
runtime_payloads_v1.json | 122 lines | 4,627 | 5db2bfff... |
runtime_payloads_v1.jsonl | 1,000 | 4,568,307 | 664381de... |
runtime_evaluator_fixtures_v1.json | 141 lines | 5,656 | 6d83eba8... |
runtime_evaluator_fixtures_v1.jsonl | 1,000 | 4,560,762 | e53ea495... |
runtime_eval_runs_v1.json | 162 lines | 6,517 | d9293c92... |
runtime_eval_runs_v1.jsonl | 1,000 | 5,429,598 | 2438fe24... |
runtime_eval_runs_v1_failing_eval_runs.jsonl | 0 | 0 | e3b0c442... |
Validation
runtime_payloads.py..venv/bin/python -m ruff check src tests .venv/bin/python -m pytest -q 200 passed in 92.88s runtime evals: 1,000 rows · 965 local serviceable · 35 blocked · 0 failing
What This Means In Reality
As of this checkpoint, the data engine can generate local synthetic runtime payloads for a much broader and more realistic set of ICP titles. It can serve role-native first-module goals and evaluator cases for frontline retail, banking, healthcare, technology, food/hospitality, manufacturing, safety, real estate, leadership, research, and other domains instead of collapsing them into generic business work.
It still cannot truthfully claim production readiness or real-user personalization. All outputs remain local-only, source-preserving, and blocked from external writes, real-user runtime, and production claims. The next trust move is not more broad deterministic mapping; it is semantic adjudication of the 526 follow-up rows, source/BLS enrichment for the 12 residual broad rows, and packet-quality hardening for the 295 packet candidates.
Next Best Slice
Start with the semantic follow-up queue and prioritize high-volume, high-impact domains. Review the 484 deterministic function changes with multi-LLM adjudication, resolve the 12 residual broad titles using source context, attach BLS/SOC context where it helps, build packet-quality fixtures for the 295 packet candidates, and keep the 670 caveat-service rows available for local testing with explicit caveats.
Start from artifacts/validation/runtime_eval_runs_v1_semantic_followup_eval_runs.jsonl: it is now the best queue for deciding which specialized runtime mappings are strong enough to graduate from local caveat/testing into hardened packets.