O*NET Task 100k Embedding Checkpoint
The task-evidence semantic layer crossed six figures, with gates still green.
The O*NET task evidence family has crossed 100,000 live Gemini vectors. The second 25,000-row tranche after the 75k checkpoint completed through Vertex ADC with zero failed rows, and AIN-510 still reports promotion-ready local exact-cosine retrieval with zero stale vectors.
What changed
onet_task_evidence, with zero quality exclusions.aina-495702.The operational rule is unchanged: use the guarded foreground selector or the remaining-only manifest. Do not submit the older full repaired manifest.
Proof commands run
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 25000 --selection-mode progressive uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 25000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 2500 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family onet_task_evidence --include-repaired --limit 50 uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family onet_task_evidence --remaining-only --shard-size 5000 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2 uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
Resume from here
There are 31,095 repaired O*NET task candidate chunks left. Run one more guarded 25,000 tranche, then refresh remaining-only and decide whether the final smaller remainder should be foreground or batch.
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --dry-run --max-new 25000 --selection-mode progressive uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family onet_task_evidence --include-repaired --max-new 25000 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 16 --timeout-seconds 120 --max-retries 5 --write-every 2500
Continue the same clean-before-embed ladder; the next target is 125,000 O*NET task vectors, not a blind batch submission.