AINA Personalization Engine Final Title-Coverage Handoff
Final handoff for agents and humans: mission match, exact repo state, inventory, validation, and next commands.
The mission moved from a planning document into a working VDS-local data engine with source-backed packets, Hugging Face and O*NET provenance, local runtime/evaluator proof, and a completed ICP title adjudication run. The title-review backlog is no longer waiting on unprocessed prompt candidates: the current receipt reports prompt_candidates=0, prompt_rows=0, 50,053 serviceable rows after adjudication, and 491 residual rows that were already reviewed and intentionally kept in adjudication rather than overclaimed.
Build an evidence-backed personalization engine.
Title prompt queue exhausted; validation green; local checkpoint committed.
Publication line: AINA data engine room - final local handoff - 2026-06-11 Author: Codex execution lane Audience: next agent, technical collaborator, and Ali Repo: /srv/aina/aina-data-engine-room Branch: ali/personalization-engine-mission-2026-06-09 Checkpoint: a65af71 Complete ICP title adjudication backlog
The Single Idea
The mission moved from a planning document into a working VDS-local data engine with source-backed packets, Hugging Face and O*NET provenance, local runtime/evaluator proof, and a completed ICP title adjudication run. The title-review backlog is no longer waiting on unprocessed prompt candidates: the current receipt reports prompt_candidates=0, prompt_rows=0, 50,053 serviceable rows after adjudication, and 491 residual rows that were already reviewed and intentionally kept in adjudication rather than overclaimed.
01. Current Landed State
This work is self-contained on the VDS. It was committed locally and not pushed or merged, per Ali's instruction.
| Item | Current value |
|---|---|
| Repo | /srv/aina/aina-data-engine-room |
| Branch | ali/personalization-engine-mission-2026-06-09 |
| Latest commit | a65af71 Complete ICP title adjudication backlog |
| Working tree | Clean before this report was authored |
| Push/merge | Not performed by design |
| Full validation | pass |
| Full pytest | 174 passed |
| Engine scope | VDS-local synthetic/internal beta only |
02. Mission Reconciliation
The original mission at docs/planning/aina-personalization-engine-mission-2026-06-09.md said AINA should become a personalized AI capability transformation engine: role, workflow, readiness, goals, capacity, and evidence should produce a learning path, realistic practice, evaluated progression, and a durable record.
That mission broke into M0-M7. Current status:
| Milestone | Original target | Current status | Evidence |
|---|---|---|---|
| M0. Mission and source lock | Convert the source docs into one product/data charter | Complete | docs/planning/aina-personalization-engine-mission-2026-06-09.md and HTML companion |
| M1. Canonical source warehouse v1 | Reproducible public/HF snapshots and provenance | Complete locally, with BLS full-cache follow-up still separate | artifacts/validation/source_authority_beta_wedge_audit_v1.json, artifacts/sources/public_source_snapshot_v1.json |
| M2. Work Intelligence Graph v1 | Role/task/workflow packets and fallback mappings | Complete for the local beta engine | artifacts/packets/, src/aina_data_engine/packets.py, artifacts/validation/packet_quality_gate_v1.json |
| M3. CurriculumInputPacket and planner v0 | Assessment input to planner-ready packet and path | Complete locally | src/aina_data_engine/schemas.py, src/aina_data_engine/planner.py, tests/test_contract_spine.py |
| M4. Practice and evaluator loop v0 | Practice submission, evaluator result, learner events | Complete locally | src/aina_data_engine/evaluator.py, src/aina_data_engine/event_replay.py, artifacts/validation/learner_event_replay_v1.json |
| M5. GDPval sandbox and HF proof | GDPval-linked tasks, rubrics, source evidence | Complete locally with safety holds | artifacts/validation/gdpval_*, artifacts/sandbox/, artifacts/validation/hf_runtime_map_receipt_v1.json |
| M6. Beta API and product surface | Local /assess, /curriculum, sandbox, submit loop | Complete as local fixture, not public runtime | artifacts/api/, artifacts/ui/, tests/test_api_runtime.py, tests/test_beta_ui_shell.py |
| M7. ICP title coverage and scale path | Route titles into service, fallback, adjudication, or exclusion | Complete for the promptable backlog | artifacts/validation/icp_title_adjudication_v1.json, artifacts/reports/icp_title_adjudication_v1.md |
The important change since the earlier handoff is M7. At the prior pause, thousands of promptable title rows remained. Now the promptable queue is exhausted.
03. Title-Coverage Result
Current receipt: artifacts/validation/icp_title_adjudication_v1.json.
| Metric | Value |
|---|---|
| Ambiguous input rows | 16,089 |
| Serviceable rows before adjudication | 45,564 |
| Serviceable rows after adjudication | 50,053 |
| Multi-LLM decisions applied | 10,957 |
| Multi-LLM fallback promotions | 3,880 |
| Multi-LLM exclusions | 6,586 |
| Deterministic fallback promotions | 609 |
| Deterministic exclusions | 4,523 |
| Remaining adjudication rows | 491 |
| Prompt candidates remaining | 0 |
| Prompt rows remaining | 0 |
| Reviewed rows skipped from prompt | 491 |
The remaining 491 are not unprocessed backlog. Every one has already been reviewed and carries multi_llm_kept_for_adjudication. They are the intentionally conservative residue: mostly low-signal, ambiguous general_business titles where two-reviewer evidence did not justify either fallback promotion or exclusion.
Residual shape:
| Residual view | Count |
|---|---|
general_business | 404 |
administration | 24 |
customer_success | 18 |
marketing | 13 |
finance | 10 |
| All other functions | 22 |
Example residuals include front end entry level, service leader, family law attorney, student intern, document reviewer, qa tester, sql dba, and security professional. These should stay out of public claims until a future rule or domain-specific evidence resolves them.
04. What Was Done In This Final Run
The final run repaired the reviewer lane and then processed the title backlog to exhaustion.
| Workstream | What changed |
|---|---|
| BLS fallback | Added official BLS API/cache fallback handling and tests, while keeping BLS lower priority after Ali confirmed title cleanup came first. |
| Deterministic filters | Added low-signal knowledge-work fallback routing and stronger obvious non-ICP/frontline/clinical/physical-work exclusions. |
| Batch 039 | Repaired and merged the previously broken batch with filtered current prompt rows. |
| GPT runner | Fixed schema generation for arbitrary batch sizes and used codex exec --ignore-user-config --ignore-rules so review subprocesses did not load MCPs/connectors. |
| Batch size tuning | Tested 200, 400, and 600. Settled on 400 as the reliable throughput lane. |
| Model policy | Stopped Claude/Spark direction and used normal GPT/Codex only for the final lane. |
| Final title sweep | Ran waves through 078 until prompt_candidates=0. |
| Validation | Ran targeted ruff/tests, full validate, and full pytest. |
| Git checkpoint | Committed local-only checkpoint a65af71. |
Notable throughput evidence:
| Wave | New consensus decisions |
|---|---|
| 055-056, 200-row wave | 302 |
| 057-058, 400-row test | 542 |
| 059, 600-row partial | 425 |
| 061-063, 400-row wave | 956 |
| 064-066, 400-row wave | 1,040 |
| 067-069, 400-row wave | 838 |
| 070-071, partial 400-row wave | 643 |
| 073-075, tail wave | 850 |
| 076 | 71 |
| 077 | 43 |
| 078 | 4 |
Batch 072 produced raw outputs but was not merged because one reviewer needed more row repairs than the runner allows. Its artifacts remain under ignored artifacts/review/model_outputs/ as evidence history, not accepted title state.
05. Repo And Data Map
The repo is now an engine room with code, tests, source documents, generated data, validation receipts, and review evidence.
| Area | Purpose | Key paths |
|---|---|---|
| Mission and planning | Product/data charter, runbooks, prior reports | docs/planning/, docs/runbooks/, docs/handoff/, docs/reports/ |
| Imported source docs | Original AINA and curriculum source material | docs/source_foundations/ainpe-files-shared/, docs/source_foundations/aina-curriculum/ |
| Engine code | Data ingestion, title coverage, runtime, evaluator, validation | src/aina_data_engine/ |
| Operational scripts | Rebuild and review-batch orchestration | scripts/rebuild_all.sh, scripts/run_icp_spark_backlog.py |
| Tests | Regression and validation tests | tests/ |
| Raw/downloaded caches | Hugging Face, O*NET/BLS/public source caches | artifacts/raw/ |
| Warehouse | DuckDB and derived warehouse outputs | artifacts/aina_data_engine.duckdb, artifacts/warehouse/ |
| Packets | 45,564 generated role/workflow packet JSON files | artifacts/packets/ |
| Validation receipts | Machine-readable proof of engine state | artifacts/validation/ |
| Human reports | Markdown/HTML reports | artifacts/reports/, docs/reports/ |
| Review evidence | Prompts, model outputs, prepared outputs, merge receipts | artifacts/review/ |
| Runtime fixtures | Local API, beta UI, sandbox, events, telemetry | artifacts/api/, artifacts/ui/, artifacts/sandbox/, artifacts/events/, artifacts/telemetry/ |
Current inventory appendix:
| Artifact | Meaning |
|---|---|
docs/handoff/2026-06-11-final-file-inventory.csv | Exhaustive file-level inventory outside .git, including ignored generated/model-output evidence. |
docs/handoff/2026-06-11-final-file-inventory-summary.json | Machine-readable counts, top-level sizes, tracked/ignored counts, and .git internal summary. |
Inventory snapshot:
| Classification | Count |
|---|---|
| Generated runtime data | 45,592 |
| Environment cache | 30,887 |
| Generated review output | 685 |
| Source material | 184 |
| Generated evidence | 173 |
| Tool cache | 148 |
| Hand-authored code | 59 |
| Hand-authored tests | 46 |
| Downloaded source cache | 46 |
| Agent deliverables | 24 |
| Generated review input | 10 |
| Hand-authored scripts | 2 |
| Hand-authored config | 4 |
| Lockfile | 1 |
Disk shape:
| Path | Approx size | Meaning |
|---|---|---|
| repo root | 13G | Entire VDS-local project |
.venv/ | 5.2G | Python environment |
artifacts/ | 7.5G | Generated data, packets, receipts, review evidence |
artifacts/aina_data_engine.duckdb | 5.5G | Local DuckDB warehouse |
artifacts/packets/ | 892M | Generated packet JSON files |
artifacts/raw/ | 499M | Downloaded/source caches |
artifacts/semantic_review/ | 210M | Deterministic semantic review outputs |
artifacts/validation/ | 105M | Validation receipts |
artifacts/review/ | 54M | Review prompts and model output evidence |
docs/ | 21M | Planning, handoffs, reports, source foundations |
Important Git note: artifacts/ is ignored for new files, but many receipt/report artifacts are already tracked. The raw model outputs and review logs are intentionally ignored, while the accepted adjudication state is tracked through artifacts/review/icp_title_adjudication_input_v1.json, artifacts/review/icp_title_adjudication_merge_v1.json, and validation/report files.
06. Source And Provenance Status
Hugging Face is real engine input now, not a footnote.
| Source layer | Current proof |
|---|---|
| Hugging Face source ledger | 2 HF sources, revisions locked: Anthropic/EconomicIndex and openai/gdpval |
| HF selected files | 15 files verified |
| HF downloaded bytes | 186,743,670 |
| EconomicIndex SOC rows | 756 |
| EconomicIndex legacy signals | 821 |
| GDPval tasks | 220 |
| Mapped SOC entries | 907 |
| Runtime packets checked | 45,564 |
| O*NET occupation rows | 1,016 |
| O*NET task rows | 18,796 |
| BLS wage/employment fact rows | 0 in full validation snapshot; official full-file follow-up remains separate |
The current system still refuses to fake BLS facts. That is correct. O*NET is usable; BLS full-cache integration should be a follow-up now that Ali has added more BLS data to the VDS.
07. Safety Boundary
The engine is useful locally, but production stays blocked.
| Boundary | Current state |
|---|---|
| Public runtime | Blocked |
| External writes | Blocked |
| Real-user data | Blocked |
| Production telemetry | Blocked |
| Deployment promotion | Blocked |
| External beta | Blocked by review holds |
| Local internal synthetic beta | Ready with review holds |
artifacts/validation/production_deployment_approval_v1.json reports production_blocked_approval_required, 0 approved unlocks, 5 blocked domains, and 1 open review hold.
08. Validation
Commands run:
uv run ruff check src/aina_data_engine/title_adjudication.py src/aina_data_engine/public_source_snapshots.py src/aina_data_engine/config.py scripts/run_icp_spark_backlog.py tests/test_icp_title_adjudication.py tests/test_public_source_snapshots.py
uv run pytest tests/test_icp_title_adjudication.py tests/test_public_source_snapshots.py -q
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run pytest -q
Results:
| Check | Result |
|---|---|
| Targeted ruff | Pass |
| Targeted pytest | 19 passed |
| Engine validation | Pass |
| Full pytest | 174 passed in 153.58s |
09. Resume Commands
Start here:
cd /srv/aina/aina-data-engine-room
git status --short --branch
git log -1 --oneline
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run python - <<'PY'
import json
m=json.load(open('artifacts/validation/icp_title_adjudication_v1.json'))['metrics']
for key in ['serviceable_rows_after_adjudication','remaining_adjudication_required_count','review_prompt_candidate_count','review_prompt_row_count','multi_llm_decision_rows_applied']:
print(key, m[key])
PY
Expected current title state:
serviceable_rows_after_adjudication 50053
remaining_adjudication_required_count 491
review_prompt_candidate_count 0
review_prompt_row_count 0
multi_llm_decision_rows_applied 10957
10. Recommended Next Milestone
Do not restart title adjudication from scratch. The promptable backlog is exhausted.
Recommended next sequence:
- BLS full-file follow-through: parse the newly added VDS BLS files into the official-cache path, prove row counts and SOC joins, update
public_source_snapshot_v1. - Residual 491 policy: decide whether to leave the residual as a conservative hold, route by stronger deterministic domain rules, or send a small domain-specialist GPT review only for those 491.
- Serve-now upgrade: define the threshold that moves a title from
serve_with_fallbacktoserve_now; do not use raw model confidence alone. - Beta runtime hardening: keep local synthetic beta ready, then add real auth/data/privacy gates before any real learner data.
- Founder review surface: use the reports and validation receipts, not raw JSON/model logs, as Ali's main review path.
11. What Not To Do Next
Do not:
- Re-run all title batches from zero.
- Treat the ignored raw model-output logs as the source of truth over the accepted merge/validation receipts.
- Unlock public runtime, external writes, real-user data, production telemetry, or deployment promotion.
- Claim BLS wage/employment coverage is complete until the full official VDS cache is parsed and validated.
- Promote the 491 residual titles without either deterministic evidence or a bounded model-review receipt.
Footer
Ali Mehdi Mukadam - co-authored with Codex - 2026-06-11
topics:
- aina-personalization-engine
- data-engine-room
- icp-title-coverage
- vds-local-execution
subtopics:
- final-handoff
- repo-inventory
- multi-llm-adjudication
- hugging-face-provenance
- production-boundaries
Start with the current validation receipts and the local checkpoint, then work only on the next bounded milestone.