AINA Data Engine Room · local handoff · 2026-06-15

O*NET Task 100k Embedding Checkpoint

The task-evidence semantic layer crossed six figures, with gates still green.

The Single Idea

The O*NET task evidence family has crossed 100,000 live Gemini vectors. The second 25,000-row tranche after the 75k checkpoint completed through Vertex ADC with zero failed rows, and AIN-510 still reports promotion-ready local exact-cosine retrieval with zero stale vectors.

01 · Progress

What changed

Selector25,000 clean candidates, all from onet_task_evidence, with zero quality exclusions.
Live run25,000 new Gemini vectors through Vertex ADC on aina-495702.
Remainder31,095 repaired O*NET task chunks remain, across 7 shards.

The operational rule is unchanged: use the guarded foreground selector or the remaining-only manifest. Do not submit the older full repaired manifest.

02 · Retrieval

Current vector authority

MetricValue
Total Gemini vectors119,912
O*NET task vectors100,000
O*NET occupation vectors2,828
Top 1,000 vector coverage1,000
Top 500 vector coverage500
Stale vectors0
Known-pair cosine gap0.190463
Unvectorized chunks overall347,048

Runtime embedding authority remains unpromoted. Public runtime, real-user data, external writes, and production telemetry remain off.

03 · Commands

Proof commands run

uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 25000 --selection-mode progressive
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 25000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 2500
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family onet_task_evidence --include-repaired --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family onet_task_evidence --remaining-only --shard-size 5000
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
04 · Next

Resume from here

There are 31,095 repaired O*NET task candidate chunks left. Run one more guarded 25,000 tranche, then refresh remaining-only and decide whether the final smaller remainder should be foreground or batch.

uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 25000 --selection-mode progressive
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 25000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 2500
Where to start

Continue the same clean-before-embed ladder; the next target is 125,000 O*NET task vectors, not a blind batch submission.