AIN-506 Anti-Loop Source Authority And Embedding Repair Handoff
The data was not missing. The missing link was an executable start-here surface that forces agents to reuse the source maps, ledgers, exports, and vector receipts before spending tokens rediscovering them.
Total Gemini vectors5,534
Serviceable title vectors3,440
Top worked vectors1,000
Top 500 coverage500
This pass added source-authority-start-here, tightened title eligibility and repair, pruned stale vectors, restored top-band completeness, and left the repo with passing P0 and validation gates.
Hidden Gems
| Anchor | Why it matters |
|---|---|
artifacts/reports/production_source_authority_registry_v1.md | Build-time source authority and title-cleaning route map. |
docs/TITLE-LEDGER.md | Title precedence, count reconciliation, and export contract. |
docs/MAPPING-CHAIN-LEDGER.md | Title to role to workflow to capability join topology. |
/home/ali/conductor/aina-consolidated/20-references/linear/doc__cross-repo-salvage-map.md | Cross-repo prior-work map; adapt before rebuilding. |
/home/ali/conductor/repos/aina-jobs-research/project-summary-package/file-index/SOURCE_MAP.md | Jobs-research lineage and output map. |
/home/ali/conductor/repos/aina-core/evidence/canonical | Stitched evidence atlas parquets. |
/home/ali/ALIPE/data/jobs/linkedin_indeed_clusters_v1/chunks_250k | Market substrate, only through clean derived chunks. |
What Changed
- Added
source-authority-start-hereplus JSON, markdown, and HTML receipts. - Added stricter repair gates for posting IDs, salaries, schedules, school years, location suffixes, comma context suffixes, store/brand fragments, and generic posting phrases.
- Preserved real roles like
support associate,seasonal sales associate, andanti-money laundering subject matter expert (sme). - Repaired wrappers like
teller part time 30 hours,sales 16 and 17 years old, andmerchandising part time days. - Embedded 500 serviceable-title vectors and 29 top-worked refill vectors with zero failed rows.
- Pruned stale vectors and restored complete top-1,000 and top-500 coverage.
Proof
uv run pytest tests/test_production_embeddings.py tests/test_source_authority_start_here.py -q 34 passed uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate status: pass, valid: true, model: gemini-embedding-2, dimensions: 768 uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate status: pass, occupations: 110184, linkedin_jobs: 129165, HF files verified: 15
Resume
cd /srv/aina/aina-data-engine-room git status --short --branch uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-start-here uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family serviceable_title --include-repaired --max-new 500 --dry-run
Before the next live call, sample 50 actual candidate titles and only proceed if they are semantically clean.
Ali Mehdi Mukadam - co-authored with Codex - 2026-06-12