JD-Aware Repair And Live Embedding Checkpoint
A clean-before-embed repair slice that fixed glued JD responsibility text and embedded one newly eligible Gemini vector.
This checkpoint repairs part of the JD-aware role-context evidence layer before embedding, then embeds exactly one newly eligible JD-aware semantic chunk with Gemini Embedding 2 through the paid Vertex ADC path. Rows that had real job-description text but failed snippet extraction because of glued headings like ResponsibilitiesApply are now parsed deterministically; rows with no JD/job context remain blocked.
What Changed
The JD-aware responsibility extractor now handles common job-posting formatting defects: punctuation without following spaces, glued responsibility headings, and section headings such as Typical Tasks, Daily Tasks, What You Will Do, and Key Accountabilities.
Data Movement
| Measure | Before | After |
|---|---|---|
| Rows with JD summaries | 954 | 962 |
| Rows with responsibility snippets | 954 | 962 |
Evidence-layer repair_first rows | 16 | 8 |
| Evidence-layer blocked rows | 34 | 34 |
| JD-aware clean corpus chunks | 288 | 289 |
| JD-aware vectors | 288 | 289 |
| Total Gemini vectors | 6506 | 6507 |
| Stale vectors | 0 | 0 |
Live Embedding Proof
Temporary Sales Associate Tommy Hilfigergemini-embedding-2768 dimensions
paid Vertex ADC path
0 failed
cosine gap 0.196032
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family jd_aware_role_context --max-new 1 --selection-mode progressive --allow-live-gemini --confirm-paid-api --timeout-seconds 120 --max-retries 2
Authority Sync
AIN-510 caught a temporary authority mismatch after the live vector: the vector existed before the global authoritative chunk table had been refreshed. The fix was to refresh the full production semantic corpus with raw market rows still excluded, then rerun reconciliation and downstream receipts.
| Final Measure | Count |
|---|---|
| Base chunks | 294672 |
| Repaired overlay chunks | 27844 |
| Combined chunks | 322516 |
| Vector rows | 6507 |
| Matched vectors | 6507 |
| Stale vectors | 0 |
Verification
Focused tests, ruff, semantic QA, chunk/vector reconciliation, AIN-506, AIN-510, production runtime readiness, source-authority registry v2, runtime contracts, source-authority start-here, and full validation all pass. AIN-510 is promotion_ready, while public runtime, real-user data, external writes, and production telemetry remain disabled.
What Remains
The full production goal is not complete. Next, inspect the remaining 8 JD-aware repair-first rows, preserve the 34 blocked rows unless source refs can be found, use the 50 JD-aware fixtures to harden AI Fluency maps, and continue embeddings only by source family after semantic QA and eligibility gates.
Start with the remaining 8 JD-aware repair-first rows; graduate only rows with clean compressed JD context, and keep everything else out until source authority improves.