AINA data engine room - VDS local execution - 2026-06-10

AINA ICP Title Coverage Goal Run Handoff

Technical handoff for the local personalization-engine title coverage lane.

Branch: ali/personalization-engine-mission-2026-06-09 - previous checkpoint: ac3e076
The Single Idea

The repo now has a self-contained, source-backed local engine that can route the full ICP title surface into serve locally now, serve with fallback caveats, or hold until stronger evidence exists. The current working state can serve 46,401 title rows locally, has 17,477 rows excluded or outside the ICP wedge, and has 10,347 ambiguous rows left. The future reviewer lane is Codex-Spark-only.

74,225raw title rows
46,401serviceable locally
17,477excluded or not ICP
10,347still ambiguous
01 - Current State

Verified Local Metrics

MeasureCountMeaning
Raw input title rows74,225All title rows in the source-backed coverage receipt.
Deduped title count74,213Title-level working surface.
Current serviceable rows46,40145,564 base serviceable rows plus 837 adjudicated fallback promotions.
Current excluded/not ICP rows17,47712,572 base exclusions plus 4,905 adjudicated exclusions.
Remaining ambiguous rows10,347Rows still needing deterministic evidence or Codex-Spark adjudication.
Structured model decisions applied2,545Accepted decisions in the adjudication input.
Production unlocks0No public runtime, external writes, real-user data, telemetry, or deployment promotion.
62.3%serviceable locally
22.9%excluded or not ICP
14.9%still ambiguous
74,225 rows source-backed surface 46,401 serve 17,477 exclude 10,347 review Codex-Spark lane 100-row two-reviewer batches
02 - What Changed

Reviewer Lane Moved To Codex-Spark

Before the policy change, complete Claude-era batches 012 and 014 were merged through the stricter gate and added 192 accepted decisions. Incomplete Claude outputs for 013 and 015 were not merged. After Ali asked to stop Claude reviews, the live lane moved to gpt-5.3-codex-spark only.

ChangeOutcome
Stopped Claude/Haiku/Sonnet reviewer laneNo new Claude outputs should be used from here.
Killed GPT-5.4-mini jobDo not use GPT-5.4-mini unless Ali changes policy.
Promoted Codex-SparkDefault reviewer model is now gpt-5.3-codex-spark.
Tested 200-row Spark batchWorked once, but the second reviewer can hit context limits.
Settled on 100-row Spark batchesSafer repeatable throughput unit.
Spark batchPrompt rowsConsensus decisionsGate note
gpt_001_20020097Two Spark reviewers; 6 exact-title SOC corrections.
gpt_002_10010047Two Spark reviewers; 1 exact-title SOC correction.
gpt_003_100-gpt_007_100500419Five 100-title batches; targeted repairs; exact-prompt gate passed.
gpt_008_100-gpt_012_100500317Five more 100-title batches; exact-prompt gate passed.
03 - What Was Built

Repo Surfaces

FileRole
src/aina_data_engine/title_coverage.pyFirst-pass ICP title coverage receipt.
src/aina_data_engine/title_adjudication.pyDeterministic routing and structured model-review decisions.
src/aina_data_engine/cli.pyCoverage, adjudication, merge-review, HF ingest, source authority, and validation commands.
tests/test_icp_title_adjudication.pyPrompt-window, parser, evidence-ref, SOC-normalization, and CLI regression tests.
artifacts/validation/Machine-readable proof of current state.
artifacts/review/Review inputs, prompts, merge receipts, and model-output evidence.
The merge gate now supports --expected-review-prompt, expected-row coverage receipts, off-prompt rejection, exact-title SOC normalization, and two-reviewer consensus.
04 - Process Now

Codex-Spark Loop

The repeatable loop is to generate a 100-row prompt, run two independent Codex-Spark sessions, preserve raw outputs, merge with the expected prompt, refresh adjudication artifacts, validate, and commit locally.

Codex - Merge Spark Review - exact prompt coverage required
uv run aina-data-engine --root /srv/aina/aina-data-engine-room title-adjudication-merge-reviews \
  --batch-id icp_ambiguous_batch_gpt_013_100_codexspark \
  --expected-review-prompt artifacts/review/model_outputs/icp_batch_gpt_013_100_prompt.md \
  --review-output "Codex Spark A:gpt-5.3-codex-spark=artifacts/review/model_outputs/icp_batch_gpt_013_100_codexspark_a.json" \
  --review-output "Codex Spark B:gpt-5.3-codex-spark=artifacts/review/model_outputs/icp_batch_gpt_013_100_codexspark_b.json"
Watch-out: model success is not acceptance. The merge receipt is the acceptance gate.
05 - Source Provenance

HF, O*NET, GDPval Chain

Source signalEvidence
Hugging Face downloaded files15
Hugging Face downloaded bytes186,743,670
GDPval task count220
EconomicIndex legacy signal count821
Mapped SOC count907
O*NET occupation rows1,016
O*NET task rows18,796
06 - Validation

Current Verification

CommandResult
uv run ruff check .Passed.
uv run pytest -q171 passed.
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validatePass.
07 - What AINA Can Serve

Current Capability Boundary

AINA can serve 46,401 title rows locally today, strongest in white-collar knowledge-work areas: sales, marketing, operations, customer success, analytics, finance, HR, legal/compliance, product, administration, education, strategy, and consulting. It should not serve 17,477 title rows today, and 10,347 rows still need adjudication.

BoundaryStatus
Local internal personalizationAllowed
Fallback precision caveatsRequired for fallback-routed titles
Public runtimeNot allowed
External writesNot allowed
Real-user dataNot allowed
Production telemetryNot allowed
Deployment promotionNot allowed
08 - Resume Prompt

Prompt For Next Agent

Codex - Resume ICP Title Coverage - Codex-Spark only
Continue in /srv/aina/aina-data-engine-room on branch ali/personalization-engine-mission-2026-06-09.

Goal: continue the AINA Personalization Engine ICP title-coverage milestone on the VDS, self-contained in local git, no push/merge ceremony.

Reviewer policy from Ali: stop Claude reviews; use Codex/Codex-Spark only. Do not use Haiku, Sonnet, or GPT-5.4-mini unless Ali explicitly changes the policy.

Current verified metrics:
- serviceable rows: 46,401
- excluded/not ICP rows: 17,477
- remaining adjudication queue: 10,347
- structured model decisions applied: 2,545
- production/external/real-user/deployment unlocks: 0

Use the current 100-row prompt batch, run two independent gpt-5.3-codex-spark outputs, merge with --expected-review-prompt, run title-adjudication, run ruff, pytest, validate, and commit locally.
Watch-out: exact prompt coverage and two-reviewer consensus are the acceptance criteria.
Where to start

Start with the current 100-row prompt batch and keep shrinking the 10,347-row ambiguity queue through Codex-Spark-only two-reviewer consensus.