Local handoff · AINA data engine room2026-06-11

AINA Personalization Engine Final Title-Coverage Handoff

Final handoff for agents and humans: mission match, exact repo state, inventory, validation, and next commands.

Ali Mehdi Mukadam · co-authored with Codex · 2026-06-11 · Source: docs/handoff/2026-06-11-final-personalization-engine-title-coverage-handoff.md

The Single Idea

The mission moved from a planning document into a working VDS-local data engine with source-backed packets, Hugging Face and O*NET provenance, local runtime/evaluator proof, and a completed ICP title adjudication run. The title-review backlog is no longer waiting on unprocessed prompt candidates: the current receipt reports prompt_candidates=0, prompt_rows=0, 50,053 serviceable rows after adjudication, and 491 residual rows that were already reviewed and intentionally kept in adjudication rather than overclaimed.

Original mission

Build an evidence-backed personalization engine.

Current proof

Title prompt queue exhausted; validation green; local checkpoint committed.

Publication line: AINA data engine room - final local handoff - 2026-06-11 Author: Codex execution lane Audience: next agent, technical collaborator, and Ali Repo: /srv/aina/aina-data-engine-room Branch: ali/personalization-engine-mission-2026-06-09 Checkpoint: a65af71 Complete ICP title adjudication backlog

The Single Idea

The mission moved from a planning document into a working VDS-local data engine with source-backed packets, Hugging Face and O*NET provenance, local runtime/evaluator proof, and a completed ICP title adjudication run. The title-review backlog is no longer waiting on unprocessed prompt candidates: the current receipt reports prompt_candidates=0, prompt_rows=0, 50,053 serviceable rows after adjudication, and 491 residual rows that were already reviewed and intentionally kept in adjudication rather than overclaimed.

01

01. Current Landed State

This work is self-contained on the VDS. It was committed locally and not pushed or merged, per Ali's instruction.

ItemCurrent value
Repo/srv/aina/aina-data-engine-room
Branchali/personalization-engine-mission-2026-06-09
Latest commita65af71 Complete ICP title adjudication backlog
Working treeClean before this report was authored
Push/mergeNot performed by design
Full validationpass
Full pytest174 passed
Engine scopeVDS-local synthetic/internal beta only
02

02. Mission Reconciliation

The original mission at docs/planning/aina-personalization-engine-mission-2026-06-09.md said AINA should become a personalized AI capability transformation engine: role, workflow, readiness, goals, capacity, and evidence should produce a learning path, realistic practice, evaluated progression, and a durable record.

That mission broke into M0-M7. Current status:

MilestoneOriginal targetCurrent statusEvidence
M0. Mission and source lockConvert the source docs into one product/data charterCompletedocs/planning/aina-personalization-engine-mission-2026-06-09.md and HTML companion
M1. Canonical source warehouse v1Reproducible public/HF snapshots and provenanceComplete locally, with BLS full-cache follow-up still separateartifacts/validation/source_authority_beta_wedge_audit_v1.json, artifacts/sources/public_source_snapshot_v1.json
M2. Work Intelligence Graph v1Role/task/workflow packets and fallback mappingsComplete for the local beta engineartifacts/packets/, src/aina_data_engine/packets.py, artifacts/validation/packet_quality_gate_v1.json
M3. CurriculumInputPacket and planner v0Assessment input to planner-ready packet and pathComplete locallysrc/aina_data_engine/schemas.py, src/aina_data_engine/planner.py, tests/test_contract_spine.py
M4. Practice and evaluator loop v0Practice submission, evaluator result, learner eventsComplete locallysrc/aina_data_engine/evaluator.py, src/aina_data_engine/event_replay.py, artifacts/validation/learner_event_replay_v1.json
M5. GDPval sandbox and HF proofGDPval-linked tasks, rubrics, source evidenceComplete locally with safety holdsartifacts/validation/gdpval_*, artifacts/sandbox/, artifacts/validation/hf_runtime_map_receipt_v1.json
M6. Beta API and product surfaceLocal /assess, /curriculum, sandbox, submit loopComplete as local fixture, not public runtimeartifacts/api/, artifacts/ui/, tests/test_api_runtime.py, tests/test_beta_ui_shell.py
M7. ICP title coverage and scale pathRoute titles into service, fallback, adjudication, or exclusionComplete for the promptable backlogartifacts/validation/icp_title_adjudication_v1.json, artifacts/reports/icp_title_adjudication_v1.md

The important change since the earlier handoff is M7. At the prior pause, thousands of promptable title rows remained. Now the promptable queue is exhausted.

03

03. Title-Coverage Result

Current receipt: artifacts/validation/icp_title_adjudication_v1.json.

MetricValue
Ambiguous input rows16,089
Serviceable rows before adjudication45,564
Serviceable rows after adjudication50,053
Multi-LLM decisions applied10,957
Multi-LLM fallback promotions3,880
Multi-LLM exclusions6,586
Deterministic fallback promotions609
Deterministic exclusions4,523
Remaining adjudication rows491
Prompt candidates remaining0
Prompt rows remaining0
Reviewed rows skipped from prompt491

The remaining 491 are not unprocessed backlog. Every one has already been reviewed and carries multi_llm_kept_for_adjudication. They are the intentionally conservative residue: mostly low-signal, ambiguous general_business titles where two-reviewer evidence did not justify either fallback promotion or exclusion.

Residual shape:

Residual viewCount
general_business404
administration24
customer_success18
marketing13
finance10
All other functions22

Example residuals include front end entry level, service leader, family law attorney, student intern, document reviewer, qa tester, sql dba, and security professional. These should stay out of public claims until a future rule or domain-specific evidence resolves them.

04

04. What Was Done In This Final Run

The final run repaired the reviewer lane and then processed the title backlog to exhaustion.

WorkstreamWhat changed
BLS fallbackAdded official BLS API/cache fallback handling and tests, while keeping BLS lower priority after Ali confirmed title cleanup came first.
Deterministic filtersAdded low-signal knowledge-work fallback routing and stronger obvious non-ICP/frontline/clinical/physical-work exclusions.
Batch 039Repaired and merged the previously broken batch with filtered current prompt rows.
GPT runnerFixed schema generation for arbitrary batch sizes and used codex exec --ignore-user-config --ignore-rules so review subprocesses did not load MCPs/connectors.
Batch size tuningTested 200, 400, and 600. Settled on 400 as the reliable throughput lane.
Model policyStopped Claude/Spark direction and used normal GPT/Codex only for the final lane.
Final title sweepRan waves through 078 until prompt_candidates=0.
ValidationRan targeted ruff/tests, full validate, and full pytest.
Git checkpointCommitted local-only checkpoint a65af71.

Notable throughput evidence:

WaveNew consensus decisions
055-056, 200-row wave302
057-058, 400-row test542
059, 600-row partial425
061-063, 400-row wave956
064-066, 400-row wave1,040
067-069, 400-row wave838
070-071, partial 400-row wave643
073-075, tail wave850
07671
07743
0784

Batch 072 produced raw outputs but was not merged because one reviewer needed more row repairs than the runner allows. Its artifacts remain under ignored artifacts/review/model_outputs/ as evidence history, not accepted title state.

05

05. Repo And Data Map

The repo is now an engine room with code, tests, source documents, generated data, validation receipts, and review evidence.

AreaPurposeKey paths
Mission and planningProduct/data charter, runbooks, prior reportsdocs/planning/, docs/runbooks/, docs/handoff/, docs/reports/
Imported source docsOriginal AINA and curriculum source materialdocs/source_foundations/ainpe-files-shared/, docs/source_foundations/aina-curriculum/
Engine codeData ingestion, title coverage, runtime, evaluator, validationsrc/aina_data_engine/
Operational scriptsRebuild and review-batch orchestrationscripts/rebuild_all.sh, scripts/run_icp_spark_backlog.py
TestsRegression and validation teststests/
Raw/downloaded cachesHugging Face, O*NET/BLS/public source cachesartifacts/raw/
WarehouseDuckDB and derived warehouse outputsartifacts/aina_data_engine.duckdb, artifacts/warehouse/
Packets45,564 generated role/workflow packet JSON filesartifacts/packets/
Validation receiptsMachine-readable proof of engine stateartifacts/validation/
Human reportsMarkdown/HTML reportsartifacts/reports/, docs/reports/
Review evidencePrompts, model outputs, prepared outputs, merge receiptsartifacts/review/
Runtime fixturesLocal API, beta UI, sandbox, events, telemetryartifacts/api/, artifacts/ui/, artifacts/sandbox/, artifacts/events/, artifacts/telemetry/

Current inventory appendix:

ArtifactMeaning
docs/handoff/2026-06-11-final-file-inventory.csvExhaustive file-level inventory outside .git, including ignored generated/model-output evidence.
docs/handoff/2026-06-11-final-file-inventory-summary.jsonMachine-readable counts, top-level sizes, tracked/ignored counts, and .git internal summary.

Inventory snapshot:

ClassificationCount
Generated runtime data45,592
Environment cache30,887
Generated review output685
Source material184
Generated evidence173
Tool cache148
Hand-authored code59
Hand-authored tests46
Downloaded source cache46
Agent deliverables24
Generated review input10
Hand-authored scripts2
Hand-authored config4
Lockfile1

Disk shape:

PathApprox sizeMeaning
repo root13GEntire VDS-local project
.venv/5.2GPython environment
artifacts/7.5GGenerated data, packets, receipts, review evidence
artifacts/aina_data_engine.duckdb5.5GLocal DuckDB warehouse
artifacts/packets/892MGenerated packet JSON files
artifacts/raw/499MDownloaded/source caches
artifacts/semantic_review/210MDeterministic semantic review outputs
artifacts/validation/105MValidation receipts
artifacts/review/54MReview prompts and model output evidence
docs/21MPlanning, handoffs, reports, source foundations

Important Git note: artifacts/ is ignored for new files, but many receipt/report artifacts are already tracked. The raw model outputs and review logs are intentionally ignored, while the accepted adjudication state is tracked through artifacts/review/icp_title_adjudication_input_v1.json, artifacts/review/icp_title_adjudication_merge_v1.json, and validation/report files.

06

06. Source And Provenance Status

Hugging Face is real engine input now, not a footnote.

Source layerCurrent proof
Hugging Face source ledger2 HF sources, revisions locked: Anthropic/EconomicIndex and openai/gdpval
HF selected files15 files verified
HF downloaded bytes186,743,670
EconomicIndex SOC rows756
EconomicIndex legacy signals821
GDPval tasks220
Mapped SOC entries907
Runtime packets checked45,564
O*NET occupation rows1,016
O*NET task rows18,796
BLS wage/employment fact rows0 in full validation snapshot; official full-file follow-up remains separate

The current system still refuses to fake BLS facts. That is correct. O*NET is usable; BLS full-cache integration should be a follow-up now that Ali has added more BLS data to the VDS.

07

07. Safety Boundary

The engine is useful locally, but production stays blocked.

BoundaryCurrent state
Public runtimeBlocked
External writesBlocked
Real-user dataBlocked
Production telemetryBlocked
Deployment promotionBlocked
External betaBlocked by review holds
Local internal synthetic betaReady with review holds

artifacts/validation/production_deployment_approval_v1.json reports production_blocked_approval_required, 0 approved unlocks, 5 blocked domains, and 1 open review hold.

08

08. Validation

Commands run:

uv run ruff check src/aina_data_engine/title_adjudication.py src/aina_data_engine/public_source_snapshots.py src/aina_data_engine/config.py scripts/run_icp_spark_backlog.py tests/test_icp_title_adjudication.py tests/test_public_source_snapshots.py
uv run pytest tests/test_icp_title_adjudication.py tests/test_public_source_snapshots.py -q
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run pytest -q

Results:

CheckResult
Targeted ruffPass
Targeted pytest19 passed
Engine validationPass
Full pytest174 passed in 153.58s
09

09. Resume Commands

Start here:

cd /srv/aina/aina-data-engine-room
git status --short --branch
git log -1 --oneline
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run python - <<'PY'
import json
m=json.load(open('artifacts/validation/icp_title_adjudication_v1.json'))['metrics']
for key in ['serviceable_rows_after_adjudication','remaining_adjudication_required_count','review_prompt_candidate_count','review_prompt_row_count','multi_llm_decision_rows_applied']:
    print(key, m[key])
PY

Expected current title state:

serviceable_rows_after_adjudication 50053
remaining_adjudication_required_count 491
review_prompt_candidate_count 0
review_prompt_row_count 0
multi_llm_decision_rows_applied 10957
10

Do not restart title adjudication from scratch. The promptable backlog is exhausted.

Recommended next sequence:

  1. BLS full-file follow-through: parse the newly added VDS BLS files into the official-cache path, prove row counts and SOC joins, update public_source_snapshot_v1.
  2. Residual 491 policy: decide whether to leave the residual as a conservative hold, route by stronger deterministic domain rules, or send a small domain-specialist GPT review only for those 491.
  3. Serve-now upgrade: define the threshold that moves a title from serve_with_fallback to serve_now; do not use raw model confidence alone.
  4. Beta runtime hardening: keep local synthetic beta ready, then add real auth/data/privacy gates before any real learner data.
  5. Founder review surface: use the reports and validation receipts, not raw JSON/model logs, as Ali's main review path.
11

11. What Not To Do Next

Do not:

12

Ali Mehdi Mukadam - co-authored with Codex - 2026-06-11

topics:
  - aina-personalization-engine
  - data-engine-room
  - icp-title-coverage
  - vds-local-execution
subtopics:
  - final-handoff
  - repo-inventory
  - multi-llm-adjudication
  - hugging-face-provenance
  - production-boundaries
Where to start

Start with the current validation receipts and the local checkpoint, then work only on the next bounded milestone.