Local handoff · AINA data engine room2026-06-11

AINA Personalization Engine Final Title-Coverage Handoff

Final handoff for agents and humans: mission match, exact repo state, inventory, validation, and next commands.

Ali Mehdi Mukadam · co-authored with Codex · 2026-06-11 · Source: docs/handoff/2026-06-11-final-personalization-engine-title-coverage-handoff.md

The Single Idea

The mission moved from a planning document into a working VDS-local data engine with source-backed packets, Hugging Face and O*NET provenance, local runtime/evaluator proof, and a completed ICP title adjudication run. The title-review backlog is no longer waiting on unprocessed prompt candidates: the current receipt reports prompt_candidates=0, prompt_rows=0, 50,053 serviceable rows after adjudication, and 491 residual rows that were already reviewed and intentionally kept in adjudication rather than overclaimed.

Original mission

Build an evidence-backed personalization engine.

Current proof

Title prompt queue exhausted; validation green; local checkpoint committed.

Publication line: AINA data engine room - final local handoff - 2026-06-11 Author: Codex execution lane Audience: next agent, technical collaborator, and Ali Repo: /srv/aina/aina-data-engine-room Branch: ali/personalization-engine-mission-2026-06-09 Checkpoint: a65af71 Complete ICP title adjudication backlog

The Single Idea

01. Current Landed State

This work is self-contained on the VDS. It was committed locally and not pushed or merged, per Ali's instruction.

Item	Current value
Repo	`/srv/aina/aina-data-engine-room`
Branch	`ali/personalization-engine-mission-2026-06-09`
Latest commit	`a65af71 Complete ICP title adjudication backlog`
Working tree	Clean before this report was authored
Push/merge	Not performed by design
Full validation	`pass`
Full pytest	`174 passed`
Engine scope	VDS-local synthetic/internal beta only

02. Mission Reconciliation

The original mission at docs/planning/aina-personalization-engine-mission-2026-06-09.md said AINA should become a personalized AI capability transformation engine: role, workflow, readiness, goals, capacity, and evidence should produce a learning path, realistic practice, evaluated progression, and a durable record.

That mission broke into M0-M7. Current status:

Milestone	Original target	Current status	Evidence
M0. Mission and source lock	Convert the source docs into one product/data charter	Complete	`docs/planning/aina-personalization-engine-mission-2026-06-09.md` and HTML companion
M1. Canonical source warehouse v1	Reproducible public/HF snapshots and provenance	Complete locally, with BLS full-cache follow-up still separate	`artifacts/validation/source_authority_beta_wedge_audit_v1.json`, `artifacts/sources/public_source_snapshot_v1.json`
M2. Work Intelligence Graph v1	Role/task/workflow packets and fallback mappings	Complete for the local beta engine	`artifacts/packets/`, `src/aina_data_engine/packets.py`, `artifacts/validation/packet_quality_gate_v1.json`
M3. CurriculumInputPacket and planner v0	Assessment input to planner-ready packet and path	Complete locally	`src/aina_data_engine/schemas.py`, `src/aina_data_engine/planner.py`, `tests/test_contract_spine.py`
M4. Practice and evaluator loop v0	Practice submission, evaluator result, learner events	Complete locally	`src/aina_data_engine/evaluator.py`, `src/aina_data_engine/event_replay.py`, `artifacts/validation/learner_event_replay_v1.json`
M5. GDPval sandbox and HF proof	GDPval-linked tasks, rubrics, source evidence	Complete locally with safety holds	`artifacts/validation/gdpval_*`, `artifacts/sandbox/`, `artifacts/validation/hf_runtime_map_receipt_v1.json`
M6. Beta API and product surface	Local `/assess`, `/curriculum`, sandbox, submit loop	Complete as local fixture, not public runtime	`artifacts/api/`, `artifacts/ui/`, `tests/test_api_runtime.py`, `tests/test_beta_ui_shell.py`
M7. ICP title coverage and scale path	Route titles into service, fallback, adjudication, or exclusion	Complete for the promptable backlog	`artifacts/validation/icp_title_adjudication_v1.json`, `artifacts/reports/icp_title_adjudication_v1.md`

The important change since the earlier handoff is M7. At the prior pause, thousands of promptable title rows remained. Now the promptable queue is exhausted.

03. Title-Coverage Result

Current receipt: artifacts/validation/icp_title_adjudication_v1.json.

Metric	Value
Ambiguous input rows	16,089
Serviceable rows before adjudication	45,564
Serviceable rows after adjudication	50,053
Multi-LLM decisions applied	10,957
Multi-LLM fallback promotions	3,880
Multi-LLM exclusions	6,586
Deterministic fallback promotions	609
Deterministic exclusions	4,523
Remaining adjudication rows	491
Prompt candidates remaining	0
Prompt rows remaining	0
Reviewed rows skipped from prompt	491

The remaining 491 are not unprocessed backlog. Every one has already been reviewed and carries multi_llm_kept_for_adjudication. They are the intentionally conservative residue: mostly low-signal, ambiguous general_business titles where two-reviewer evidence did not justify either fallback promotion or exclusion.

Residual shape:

Residual view	Count
`general_business`	404
`administration`	24
`customer_success`	18
`marketing`	13
`finance`	10
All other functions	22

Example residuals include front end entry level, service leader, family law attorney, student intern, document reviewer, qa tester, sql dba, and security professional. These should stay out of public claims until a future rule or domain-specific evidence resolves them.

04. What Was Done In This Final Run

The final run repaired the reviewer lane and then processed the title backlog to exhaustion.

Workstream	What changed
BLS fallback	Added official BLS API/cache fallback handling and tests, while keeping BLS lower priority after Ali confirmed title cleanup came first.
Deterministic filters	Added low-signal knowledge-work fallback routing and stronger obvious non-ICP/frontline/clinical/physical-work exclusions.
Batch 039	Repaired and merged the previously broken batch with filtered current prompt rows.
GPT runner	Fixed schema generation for arbitrary batch sizes and used `codex exec --ignore-user-config --ignore-rules` so review subprocesses did not load MCPs/connectors.
Batch size tuning	Tested 200, 400, and 600. Settled on 400 as the reliable throughput lane.
Model policy	Stopped Claude/Spark direction and used normal GPT/Codex only for the final lane.
Final title sweep	Ran waves through `078` until `prompt_candidates=0`.
Validation	Ran targeted ruff/tests, full `validate`, and full pytest.
Git checkpoint	Committed local-only checkpoint `a65af71`.

Notable throughput evidence:

Wave	New consensus decisions
055-056, 200-row wave	302
057-058, 400-row test	542
059, 600-row partial	425
061-063, 400-row wave	956
064-066, 400-row wave	1,040
067-069, 400-row wave	838
070-071, partial 400-row wave	643
073-075, tail wave	850
076	71
077	43
078	4

Batch 072 produced raw outputs but was not merged because one reviewer needed more row repairs than the runner allows. Its artifacts remain under ignored artifacts/review/model_outputs/ as evidence history, not accepted title state.

05. Repo And Data Map

The repo is now an engine room with code, tests, source documents, generated data, validation receipts, and review evidence.

Area	Purpose	Key paths
Mission and planning	Product/data charter, runbooks, prior reports	`docs/planning/`, `docs/runbooks/`, `docs/handoff/`, `docs/reports/`
Imported source docs	Original AINA and curriculum source material	`docs/source_foundations/ainpe-files-shared/`, `docs/source_foundations/aina-curriculum/`
Engine code	Data ingestion, title coverage, runtime, evaluator, validation	`src/aina_data_engine/`
Operational scripts	Rebuild and review-batch orchestration	`scripts/rebuild_all.sh`, `scripts/run_icp_spark_backlog.py`
Tests	Regression and validation tests	`tests/`
Raw/downloaded caches	Hugging Face, O*NET/BLS/public source caches	`artifacts/raw/`
Warehouse	DuckDB and derived warehouse outputs	`artifacts/aina_data_engine.duckdb`, `artifacts/warehouse/`
Packets	45,564 generated role/workflow packet JSON files	`artifacts/packets/`
Validation receipts	Machine-readable proof of engine state	`artifacts/validation/`
Human reports	Markdown/HTML reports	`artifacts/reports/`, `docs/reports/`
Review evidence	Prompts, model outputs, prepared outputs, merge receipts	`artifacts/review/`
Runtime fixtures	Local API, beta UI, sandbox, events, telemetry	`artifacts/api/`, `artifacts/ui/`, `artifacts/sandbox/`, `artifacts/events/`, `artifacts/telemetry/`

Current inventory appendix:

Artifact	Meaning
`docs/handoff/2026-06-11-final-file-inventory.csv`	Exhaustive file-level inventory outside `.git`, including ignored generated/model-output evidence.
`docs/handoff/2026-06-11-final-file-inventory-summary.json`	Machine-readable counts, top-level sizes, tracked/ignored counts, and `.git` internal summary.

Inventory snapshot:

Classification	Count
Generated runtime data	45,592
Environment cache	30,887
Generated review output	685
Source material	184
Generated evidence	173
Tool cache	148
Hand-authored code	59
Hand-authored tests	46
Downloaded source cache	46
Agent deliverables	24
Generated review input	10
Hand-authored scripts	2
Hand-authored config	4
Lockfile	1

Disk shape:

Path	Approx size	Meaning
repo root	13G	Entire VDS-local project
`.venv/`	5.2G	Python environment
`artifacts/`	7.5G	Generated data, packets, receipts, review evidence
`artifacts/aina_data_engine.duckdb`	5.5G	Local DuckDB warehouse
`artifacts/packets/`	892M	Generated packet JSON files
`artifacts/raw/`	499M	Downloaded/source caches
`artifacts/semantic_review/`	210M	Deterministic semantic review outputs
`artifacts/validation/`	105M	Validation receipts
`artifacts/review/`	54M	Review prompts and model output evidence
`docs/`	21M	Planning, handoffs, reports, source foundations

Important Git note: artifacts/ is ignored for new files, but many receipt/report artifacts are already tracked. The raw model outputs and review logs are intentionally ignored, while the accepted adjudication state is tracked through artifacts/review/icp_title_adjudication_input_v1.json, artifacts/review/icp_title_adjudication_merge_v1.json, and validation/report files.

06. Source And Provenance Status

Hugging Face is real engine input now, not a footnote.

Source layer	Current proof
Hugging Face source ledger	2 HF sources, revisions locked: `Anthropic/EconomicIndex` and `openai/gdpval`
HF selected files	15 files verified
HF downloaded bytes	186,743,670
EconomicIndex SOC rows	756
EconomicIndex legacy signals	821
GDPval tasks	220
Mapped SOC entries	907
Runtime packets checked	45,564
O*NET occupation rows	1,016
O*NET task rows	18,796
BLS wage/employment fact rows	0 in full validation snapshot; official full-file follow-up remains separate

The current system still refuses to fake BLS facts. That is correct. O*NET is usable; BLS full-cache integration should be a follow-up now that Ali has added more BLS data to the VDS.

07. Safety Boundary

The engine is useful locally, but production stays blocked.

Boundary	Current state
Public runtime	Blocked
External writes	Blocked
Real-user data	Blocked
Production telemetry	Blocked
Deployment promotion	Blocked
External beta	Blocked by review holds
Local internal synthetic beta	Ready with review holds

artifacts/validation/production_deployment_approval_v1.json reports production_blocked_approval_required, 0 approved unlocks, 5 blocked domains, and 1 open review hold.

08. Validation

Commands run:

uv run ruff check src/aina_data_engine/title_adjudication.py src/aina_data_engine/public_source_snapshots.py src/aina_data_engine/config.py scripts/run_icp_spark_backlog.py tests/test_icp_title_adjudication.py tests/test_public_source_snapshots.py
uv run pytest tests/test_icp_title_adjudication.py tests/test_public_source_snapshots.py -q
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run pytest -q

Results:

Check	Result
Targeted ruff	Pass
Targeted pytest	`19 passed`
Engine validation	Pass
Full pytest	`174 passed in 153.58s`

09. Resume Commands

Start here:

cd /srv/aina/aina-data-engine-room
git status --short --branch
git log -1 --oneline
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run python - <<'PY'
import json
m=json.load(open('artifacts/validation/icp_title_adjudication_v1.json'))['metrics']
for key in ['serviceable_rows_after_adjudication','remaining_adjudication_required_count','review_prompt_candidate_count','review_prompt_row_count','multi_llm_decision_rows_applied']:
    print(key, m[key])
PY

Expected current title state:

serviceable_rows_after_adjudication 50053
remaining_adjudication_required_count 491
review_prompt_candidate_count 0
review_prompt_row_count 0
multi_llm_decision_rows_applied 10957

10. Recommended Next Milestone

Do not restart title adjudication from scratch. The promptable backlog is exhausted.

Recommended next sequence:

BLS full-file follow-through: parse the newly added VDS BLS files into the official-cache path, prove row counts and SOC joins, update public_source_snapshot_v1.
Residual 491 policy: decide whether to leave the residual as a conservative hold, route by stronger deterministic domain rules, or send a small domain-specialist GPT review only for those 491.
Serve-now upgrade: define the threshold that moves a title from serve_with_fallback to serve_now; do not use raw model confidence alone.
Beta runtime hardening: keep local synthetic beta ready, then add real auth/data/privacy gates before any real learner data.
Founder review surface: use the reports and validation receipts, not raw JSON/model logs, as Ali's main review path.

11. What Not To Do Next

Do not:

Re-run all title batches from zero.
Treat the ignored raw model-output logs as the source of truth over the accepted merge/validation receipts.
Unlock public runtime, external writes, real-user data, production telemetry, or deployment promotion.
Claim BLS wage/employment coverage is complete until the full official VDS cache is parsed and validated.
Promote the 491 residual titles without either deterministic evidence or a bounded model-review receipt.

Ali Mehdi Mukadam - co-authored with Codex - 2026-06-11

topics:
  - aina-personalization-engine
  - data-engine-room
  - icp-title-coverage
  - vds-local-execution
subtopics:
  - final-handoff
  - repo-inventory
  - multi-llm-adjudication
  - hugging-face-provenance
  - production-boundaries

Where to start

Start with the current validation receipts and the local checkpoint, then work only on the next bounded milestone.

Ali Mehdi Mukadam · co-authored with Codex · 2026-06-11

topics:
  - aina-personalization-engine
  - data-engine-room
  - icp-title-coverage
  - vds-local-execution
subtopics:
  - final-handoff
  - repo-inventory
  - multi-llm-adjudication
  - hugging-face-provenance
  - production-boundaries

#aina-personalization-engine#data-engine-room#icp-title-coverage#hugging-face-datasets#vds-local-execution#mission-reconciliation#service-tier-routing#multi-llm-adjudication#repo-inventory#production-boundaries

AINA Personalization Engine Final Title-Coverage Handoff

The Single Idea

01. Current Landed State

02. Mission Reconciliation

03. Title-Coverage Result

04. What Was Done In This Final Run

05. Repo And Data Map

06. Source And Provenance Status

07. Safety Boundary

08. Validation

09. Resume Commands

10. Recommended Next Milestone

11. What Not To Do Next

Footer