AINA Data Engine Room2026-06-11

Runtime Title Specialization Handoff

A local checkpoint that turns broad ICP title coverage into role-native runtime payloads with provenance and evaluator follow-up.

Ali Mehdi Mukadam · co-authored with Codex · 6 minute read · /srv/aina/aina-data-engine-room

The Single Idea

This slice moved the local runtime engine from broad title coverage toward role-native serviceability. The title universe still has the same 1,000 runtime rows and 965 locally serviceable rows, but the vague general_business serviceable bucket dropped from 261 rows at the start of the slice to 12 rows after deterministic specialization.

Before261 serviceable rows still used broad business context, so titles like Store Manager, Teller, Executive Chef, and System Administrator received generic module goals.

AfterOnly 12 serviceable rows remain broad; the rest now receive domain-specific runtime functions, artifacts, and evaluator cases while preserving source provenance.

01What changed
02Before and after
03Current coverage
04Semantic sanity check
05Artifact inventory
06Validation
07What this means
08Next best slice

What Changed

The runtime payload builder now has an ordered deterministic resolver in src/aina_data_engine/runtime_payloads.py. It uses title text to assign sidecar runtime functions when upstream function data is missing, generic, or plainly wrong, while preserving the original source_function and a function_resolution record for every change.

The resolver now covers sales, finance, healthcare, technology, retail operations, hospitality/travel, supply chain, manufacturing, quality/safety, facilities, real estate, community services, personal services, engineering/hardware, legal/compliance, public safety, research/science, leadership strategy, people/HR, customer success, marketing, product, data/analytics, design/creative, education, operations, administration, and strategy consulting.

The generated learner and tutor payloads also gained workflow language for the new functions, so titles like Store Manager, System Administrator, Primary Care Physician, Teller, Executive Chef, Police Officer, and Postdoctoral Researcher no longer receive generic business exercises.

Before And After

Metric	Before	After
Runtime rows	1,000	1,000
Locally serviceable rows	965	965
Packet-hardening rows	295	295
Caveat-service rows	670	670
Blocked/non-runtime rows	35	35
Serviceable `general_business` rows	261	12
Function changes flagged for semantic follow-up	171	484
Semantic follow-up rows	462	526
Failing eval rows	0	0

The increased semantic follow-up count is intentional. The evaluator now sees deterministic function changes instead of silent broad context, so those rows become reviewable by the multi-LLM lane.

Current Function Coverage

Runtime function	Rows	Runtime function	Rows
sales	140	operations	91
finance	88	healthcare	81
administration	55	customer_success	54
retail_operations	48	data_analytics	45
legal_compliance	38	technology	35
marketing	34	hospitality	26
supply_chain	26	manufacturing	25
people_hr	25	engineering_hardware	22
facilities	21	design_creative	19
quality_safety	17	general_business	12
real_estate	11	leadership_strategy	10
product	9	personal_services	8
education	7	community_services	5
research_science	5	strategy_consulting	5
public_safety	3

Semantic Sanity Check

I inspected 50 changed runtime rows with these columns: title, source function, resolved function, deterministic reason, and first module goal. The sample looked semantically coherent from learner, tutor, platform, and evaluator perspectives.

Title	Runtime function	Runtime artifact
Store Manager	retail_operations	store operations brief, shift plan, or merchandising checklist
Primary Care Physician	healthcare	intake summary, shift note, or safety checklist
System Administrator	technology	technical runbook, troubleshooting note, or implementation plan
Executive Chef	hospitality	shift brief, service-recovery note, or prep checklist
Teller	finance	variance note, forecast assumption, or control checklist
Case Manager	community_services	case summary, referral note, or follow-up plan
Police Officer	public_safety	incident note, patrol brief, or escalation checklist
Postdoctoral Researcher	research_science	experiment note, literature brief, or findings summary

The remaining 12 general_business serviceable rows are deliberately broad: Entry Level Professionals, Associate, Intern, Community Manager, Team Lead, Referee, Summer Intern, Immediate Entry Level Opportunity No Experience Needed, Experience Management-Senior Manager, Cognitive Performance Specialist, Manager, and Attendant.

Artifact Inventory

Generated runtime artifacts live under /srv/aina/aina-data-engine-room/artifacts/validation/. The artifacts/ directory is ignored by default, so this checkpoint explicitly preserves the runtime v1 outputs with git add -f.

Artifact	Rows	Bytes	SHA-256
`runtime_payloads_v1.json`	122 lines	4,627	`5db2bfff...`
`runtime_payloads_v1.jsonl`	1,000	4,568,307	`664381de...`
`runtime_evaluator_fixtures_v1.json`	141 lines	5,656	`6d83eba8...`
`runtime_evaluator_fixtures_v1.jsonl`	1,000	4,560,762	`e53ea495...`
`runtime_eval_runs_v1.json`	162 lines	6,517	`d9293c92...`
`runtime_eval_runs_v1.jsonl`	1,000	5,429,598	`2438fe24...`
`runtime_eval_runs_v1_failing_eval_runs.jsonl`	0	0	`e3b0c442...`

Validation

01 CodeResolver and workflow copy updated in runtime_payloads.py.

02 TestsBlack-box runtime test added for broad-title specialization.

03 ArtifactsPayloads, fixtures, and eval runs regenerated locally.

04 ProofRuff passed; focused tests 10 passed; full tests 200 passed.

.venv/bin/python -m ruff check src tests
.venv/bin/python -m pytest -q

200 passed in 92.88s
runtime evals: 1,000 rows · 965 local serviceable · 35 blocked · 0 failing

What This Means In Reality

As of this checkpoint, the data engine can generate local synthetic runtime payloads for a much broader and more realistic set of ICP titles. It can serve role-native first-module goals and evaluator cases for frontline retail, banking, healthcare, technology, food/hospitality, manufacturing, safety, real estate, leadership, research, and other domains instead of collapsing them into generic business work.

It still cannot truthfully claim production readiness or real-user personalization. All outputs remain local-only, source-preserving, and blocked from external writes, real-user runtime, and production claims. The next trust move is not more broad deterministic mapping; it is semantic adjudication of the 526 follow-up rows, source/BLS enrichment for the 12 residual broad rows, and packet-quality hardening for the 295 packet candidates.

Next Best Slice

Start with the semantic follow-up queue and prioritize high-volume, high-impact domains. Review the 484 deterministic function changes with multi-LLM adjudication, resolve the 12 residual broad titles using source context, attach BLS/SOC context where it helps, build packet-quality fixtures for the 295 packet candidates, and keep the 670 caveat-service rows available for local testing with explicit caveats.

Where To Start

Start from artifacts/validation/runtime_eval_runs_v1_semantic_followup_eval_runs.jsonl: it is now the best queue for deciding which specialized runtime mappings are strong enough to graduate from local caveat/testing into hardened packets.