O*NET Task 125k Embedding Checkpoint
The task-evidence semantic layer is almost family-complete, with only 6,095 repaired chunks left.
The O*NET task evidence family has crossed 125,000 live Gemini vectors. The guarded tranche after the 100k checkpoint completed through Vertex ADC with zero failed rows, and AIN-510 still reports promotion-ready local exact-cosine retrieval with zero stale vectors.
What changed
onet_task_evidence, with zero quality exclusions.aina-495702.The operational rule is unchanged: use the guarded foreground selector for the small remainder. Do not submit the older full repaired manifest.
Proof commands run
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 25000 --selection-mode progressive uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 25000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 2500 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family onet_task_evidence --include-repaired --limit 50 uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family onet_task_evidence --remaining-only --shard-size 5000 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
Resume from here
There are 6,095 repaired O*NET task candidate chunks left. Run the final smaller foreground tranche, then refresh remaining-only again and record the family-complete checkpoint.
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 10000 --selection-mode progressive uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 10000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 1000
Finish the final 6,095-row O*NET task foreground run; batch is unnecessary for this small remainder.