Serviceable Title Embedding 2,000 Checkpoint
A third controlled Gemini Embedding 2 slice moved serviceable-title coverage to 2,000 vectors while the next batch sampler stayed clean.
The serviceable-title Gemini Embedding 2 lane advanced another controlled 500 rows after the 1,500-vector checkpoint. The serviceable-title vector count is now 2,000, total Gemini vectors are 4,094, the live quality gates still pass, and the next 500 candidates pass the compact semantic sampler.
Another Conservative Slice, No Batch
The repo started clean at commit 772daee, with the 1,500 checkpoint receipt already proving the next 500 candidates were clean. A third conservative live slice was run against serviceable_title only. No batch job was submitted.
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family serviceable_title --include-repaired --max-new 500 --allow-live-gemini --confirm-paid-api --workers 8 --write-every 50 --timeout-seconds 60
The Serviceable-Title Index Advanced To 2,000
| Metric | Value |
|---|---|
| Status | pass |
| New vectors embedded | 500 |
| Failed rows | 0 |
| Existing serviceable-title vectors before run | 1,500 |
| Serviceable-title vectors after dry run | 2,000 |
| Total Gemini vectors | 4,094 |
| Known-pair cosine gap | 0.225806 |
The Following 500 Are Clean To Evaluate
The dry run selected another 500 serviceable-title candidates without invoking Gemini. The compact semantic sampler checked 50 rows and found no posting/location/company flags and no label leakage into weak or missing-authority text.
| Check | Value |
|---|---|
| Next candidate count | 500 |
| Rows sampled | 50 |
| Noise flags | 0 |
| Label leaks | 0 |
| Orphan vectors pruned | 0 |
This Still Does Not Promote Public Runtime Authority
This checkpoint is build-time semantic-layer progress. The P0 gate still reports runtime_embedding_authority_promoted: false; exact cosine remains the source of truth, and AIN-510 still owns runtime retrieval promotion.
The Local Checkpoint Passed
| Check | Result |
|---|---|
| Focused tests | 38 passed |
| Ruff | All checks passed |
| P0 gate | pass |
| Full validation | pass |
Continue From The 2,000 Receipt
cd /srv/aina/aina-data-engine-room
git status --short --branch
jq '{status, valid, live_metrics: .live_run.metrics, dry_run_metrics: .post_run_dry_run.metrics, next_sampler: .next_500_semantic_sampler}' artifacts/validation/ain_506_serviceable_title_progressive_live_2000_v1.json
The next sensible move is another 500 serviceable-title live slice only after rerunning the compact sampler. Do not batch this source family yet; the repair queue is still substantial and the goal is clean progressive scale, not fast junk embedding.
Start with ain_506_serviceable_title_progressive_live_2000_v1.json; it is the compact proof that this slice passed and the next slice is ready to evaluate.