AINA Data Engine Room · local handoff · 2026-06-15

O*NET Task 125k Embedding Checkpoint

The task-evidence semantic layer is almost family-complete, with only 6,095 repaired chunks left.

The Single Idea

The O*NET task evidence family has crossed 125,000 live Gemini vectors. The guarded tranche after the 100k checkpoint completed through Vertex ADC with zero failed rows, and AIN-510 still reports promotion-ready local exact-cosine retrieval with zero stale vectors.

01 · Progress

What changed

Selector25,000 clean candidates, all from onet_task_evidence, with zero quality exclusions.
Live run25,000 new Gemini vectors through Vertex ADC on aina-495702.
Remainder6,095 repaired O*NET task chunks remain, across 2 shards.

The operational rule is unchanged: use the guarded foreground selector for the small remainder. Do not submit the older full repaired manifest.

02 · Retrieval

Current vector authority

MetricValue
Total Gemini vectors144,912
O*NET task vectors125,000
O*NET occupation vectors2,828
Top 1,000 vector coverage1,000
Top 500 vector coverage500
Stale vectors0
Known-pair cosine gap0.190463
Unvectorized chunks overall322,048

Runtime embedding authority remains unpromoted. Public runtime, real-user data, external writes, and production telemetry remain off.

03 · Commands

Proof commands run

uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 25000 --selection-mode progressive
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 25000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 2500
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family onet_task_evidence --include-repaired --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family onet_task_evidence --remaining-only --shard-size 5000
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
04 · Next

Resume from here

There are 6,095 repaired O*NET task candidate chunks left. Run the final smaller foreground tranche, then refresh remaining-only again and record the family-complete checkpoint.

uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 10000 --selection-mode progressive
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 10000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 1000
Where to start

Finish the final 6,095-row O*NET task foreground run; batch is unnecessary for this small remainder.