AINA Data Engine Room · IWA source evidence · 2026-06-15

IWA Evidence Family Complete Embedding Checkpoint

A small, clean source-evidence family moved from repair-first to fully embedded without retrieval regression.

Ali Mehdi Mukadam · co-authored with Codex · 4 minute read
The Single Idea

The iwa_evidence source family is now fully embedded in the local Gemini vector authority. It passed repair, repaired-input semantic QA, dry-run selection, live Gemini embedding, AIN-510, vector reconciliation, source authority registry, and full validation.

01 · What ran

The full family moved through the clean-before-embed lane.

Repair476 rows repaired with placeholder noise cleanup.
QA50 of 50 repaired sample rows passed.
Dry-run476 candidates, no existing vectors.
Live476 vectors added, 0 provider failures.
StepResult
Repair queue476 rows ready
Repaired corpus476 chunks, 0 skipped
Semantic QA50/50 pass, 0 raw JD hits
Live Gemini476 vectors added, 0 failed
02 · Current authority

The vector authority stayed promotion-ready.

ReceiptCurrent value
Total Gemini vectors151,983
iwa_evidence vectors476
Known-pair cosine gap0.190303
Stale vectors0
Combined chunks467,436
Unvectorized chunks315,453
AIN-510 statuspromotion_ready
validatepass
03 · Why this was safe

It was small, source-backed, and did not move the quality floor.

This was a compact source-evidence family with deterministic repair and a passing repaired-input semantic QA sample. Unlike the broad mixed semantic_review 5k expansion, this family did not reduce known-pair separation, did not introduce provider failures, and did not require rollback.

04 · Mission mapping

This completes another M2 source family.

Mission sliceStatus
M2.S1 source-family eligibilityCompleted for iwa_evidence
M2.S2 progressive Gemini runsCompleted family in one 476-row live run after dry-run proof
AIN-510 retrieval proofStill promotion_ready
Runtime boundaryStill local-only and unpromoted
05 · Next actions

Keep harvesting small clean families first.

Continue with small, high-signal families before revisiting broad mixed families. Good candidates should have repaired-input QA pass, enough text signal, no label-only rows, and a clean 500-or-smaller live proof. Keep jobs_research_role blocked until richer context repair exists, and do not retry semantic_review at 5k as one mixed family until it is partitioned or the quality-pair suite is improved.

06 · Resume commands

Restart from the 151,983-vector authority.

cd /srv/aina/aina-data-engine-room
git status --short --branch
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
Where to start

Start from the 151,983-vector authority and keep choosing source families that pass repaired-input QA before live spend.