AINA Data Engine Room2026-06-11

Runtime Title Specialization Handoff

A local checkpoint that turns broad ICP title coverage into role-native runtime payloads with provenance and evaluator follow-up.

Ali Mehdi Mukadam · co-authored with Codex · 6 minute read · /srv/aina/aina-data-engine-room

The Single Idea

This slice moved the local runtime engine from broad title coverage toward role-native serviceability. The title universe still has the same 1,000 runtime rows and 965 locally serviceable rows, but the vague general_business serviceable bucket dropped from 261 rows at the start of the slice to 12 rows after deterministic specialization.

Before261 serviceable rows still used broad business context, so titles like Store Manager, Teller, Executive Chef, and System Administrator received generic module goals.
AfterOnly 12 serviceable rows remain broad; the rest now receive domain-specific runtime functions, artifacts, and evaluator cases while preserving source provenance.
  1. 01What changed
  2. 02Before and after
  3. 03Current coverage
  4. 04Semantic sanity check
  5. 05Artifact inventory
  6. 06Validation
  7. 07What this means
  8. 08Next best slice
01

What Changed

The runtime payload builder now has an ordered deterministic resolver in src/aina_data_engine/runtime_payloads.py. It uses title text to assign sidecar runtime functions when upstream function data is missing, generic, or plainly wrong, while preserving the original source_function and a function_resolution record for every change.

The resolver now covers sales, finance, healthcare, technology, retail operations, hospitality/travel, supply chain, manufacturing, quality/safety, facilities, real estate, community services, personal services, engineering/hardware, legal/compliance, public safety, research/science, leadership strategy, people/HR, customer success, marketing, product, data/analytics, design/creative, education, operations, administration, and strategy consulting.

The generated learner and tutor payloads also gained workflow language for the new functions, so titles like Store Manager, System Administrator, Primary Care Physician, Teller, Executive Chef, Police Officer, and Postdoctoral Researcher no longer receive generic business exercises.

02

Before And After

MetricBeforeAfter
Runtime rows1,0001,000
Locally serviceable rows965965
Packet-hardening rows295295
Caveat-service rows670670
Blocked/non-runtime rows3535
Serviceable general_business rows26112
Function changes flagged for semantic follow-up171484
Semantic follow-up rows462526
Failing eval rows00
The increased semantic follow-up count is intentional. The evaluator now sees deterministic function changes instead of silent broad context, so those rows become reviewable by the multi-LLM lane.
03

Current Function Coverage

Runtime functionRowsRuntime functionRows
sales140operations91
finance88healthcare81
administration55customer_success54
retail_operations48data_analytics45
legal_compliance38technology35
marketing34hospitality26
supply_chain26manufacturing25
people_hr25engineering_hardware22
facilities21design_creative19
quality_safety17general_business12
real_estate11leadership_strategy10
product9personal_services8
education7community_services5
research_science5strategy_consulting5
public_safety3
04

Semantic Sanity Check

I inspected 50 changed runtime rows with these columns: title, source function, resolved function, deterministic reason, and first module goal. The sample looked semantically coherent from learner, tutor, platform, and evaluator perspectives.

TitleRuntime functionRuntime artifact
Store Managerretail_operationsstore operations brief, shift plan, or merchandising checklist
Primary Care Physicianhealthcareintake summary, shift note, or safety checklist
System Administratortechnologytechnical runbook, troubleshooting note, or implementation plan
Executive Chefhospitalityshift brief, service-recovery note, or prep checklist
Tellerfinancevariance note, forecast assumption, or control checklist
Case Managercommunity_servicescase summary, referral note, or follow-up plan
Police Officerpublic_safetyincident note, patrol brief, or escalation checklist
Postdoctoral Researcherresearch_scienceexperiment note, literature brief, or findings summary

The remaining 12 general_business serviceable rows are deliberately broad: Entry Level Professionals, Associate, Intern, Community Manager, Team Lead, Referee, Summer Intern, Immediate Entry Level Opportunity No Experience Needed, Experience Management-Senior Manager, Cognitive Performance Specialist, Manager, and Attendant.

05

Artifact Inventory

Generated runtime artifacts live under /srv/aina/aina-data-engine-room/artifacts/validation/. The artifacts/ directory is ignored by default, so this checkpoint explicitly preserves the runtime v1 outputs with git add -f.

ArtifactRowsBytesSHA-256
runtime_payloads_v1.json122 lines4,6275db2bfff...
runtime_payloads_v1.jsonl1,0004,568,307664381de...
runtime_evaluator_fixtures_v1.json141 lines5,6566d83eba8...
runtime_evaluator_fixtures_v1.jsonl1,0004,560,762e53ea495...
runtime_eval_runs_v1.json162 lines6,517d9293c92...
runtime_eval_runs_v1.jsonl1,0005,429,5982438fe24...
runtime_eval_runs_v1_failing_eval_runs.jsonl00e3b0c442...
06

Validation

01 CodeResolver and workflow copy updated in runtime_payloads.py.
02 TestsBlack-box runtime test added for broad-title specialization.
03 ArtifactsPayloads, fixtures, and eval runs regenerated locally.
04 ProofRuff passed; focused tests 10 passed; full tests 200 passed.
.venv/bin/python -m ruff check src tests
.venv/bin/python -m pytest -q

200 passed in 92.88s
runtime evals: 1,000 rows · 965 local serviceable · 35 blocked · 0 failing
07

What This Means In Reality

As of this checkpoint, the data engine can generate local synthetic runtime payloads for a much broader and more realistic set of ICP titles. It can serve role-native first-module goals and evaluator cases for frontline retail, banking, healthcare, technology, food/hospitality, manufacturing, safety, real estate, leadership, research, and other domains instead of collapsing them into generic business work.

It still cannot truthfully claim production readiness or real-user personalization. All outputs remain local-only, source-preserving, and blocked from external writes, real-user runtime, and production claims. The next trust move is not more broad deterministic mapping; it is semantic adjudication of the 526 follow-up rows, source/BLS enrichment for the 12 residual broad rows, and packet-quality hardening for the 295 packet candidates.

08

Next Best Slice

Start with the semantic follow-up queue and prioritize high-volume, high-impact domains. Review the 484 deterministic function changes with multi-LLM adjudication, resolve the 12 residual broad titles using source context, attach BLS/SOC context where it helps, build packet-quality fixtures for the 295 packet candidates, and keep the 670 caveat-service rows available for local testing with explicit caveats.

Where To Start

Start from artifacts/validation/runtime_eval_runs_v1_semantic_followup_eval_runs.jsonl: it is now the best queue for deciding which specialized runtime mappings are strong enough to graduate from local caveat/testing into hardened packets.