AINA Data Engine Room · local handoff · 2026-06-15

O*NET Task Family Complete Embedding Checkpoint

The repaired O*NET task evidence family is fully embedded through Gemini Embedding 2.

The Single Idea

The repaired O*NET task evidence family is fully embedded. The final 6,095-row foreground tranche completed through Vertex ADC with zero failed rows, and the remaining-only surface now reports zero O*NET task candidates left.

01 · Progress

What changed

Selector6,095 clean final candidates, all from onet_task_evidence, with zero quality exclusions.
Live run6,095 new Gemini vectors through Vertex ADC on aina-495702.
Remainder0 repaired O*NET task chunks remain; no remaining shards were emitted.

The O*NET task family no longer needs batch submission. The older full repaired manifest remains explicitly out of bounds.

02 · Retrieval

Current vector authority

MetricValue
Total Gemini vectors151,007
O*NET task vectors131,095
O*NET occupation vectors2,828
Top 1,000 vector coverage1,000
Top 500 vector coverage500
Stale vectors0
Known-pair cosine gap0.190463
Combined vector coverage ratio0.323383
Unvectorized chunks overall315,953

Runtime embedding authority remains unpromoted. Public runtime, real-user data, external writes, and production telemetry remain off.

03 · Commands

Proof commands run

uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 10000 --selection-mode progressive
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 10000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 1000
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family onet_task_evidence --include-repaired --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family onet_task_evidence --remaining-only --shard-size 5000
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
04 · Next

Resume from here

O*NET task evidence is complete. The next embedding milestone should move to the next clean source family from the eligibility ledger, using a 50-row semantic QA check and a 500-row live start.

uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family <next_family> --include-repaired --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family <next_family> --include-repaired --dry-run --max-new 500 --selection-mode progressive
Boundary

Do not promote runtime embedding authority or public production behavior from this checkpoint alone. AIN-510 remains local exact-cosine proof, with rollback required before production promotion.