AINA Data Engine Room handoff · 2026-06-13 · branch ali/ain-506-p0-gate-2026-06-12

Source Authority And Runtime Reconciliation Checkpoint

The production spine now reconciles chunk/vector authority, source registry, runtime contracts, and JD-aware role context from live repo state.

The Single Idea

This checkpoint proves the next serial production-spine slice after the terminology cleanup: the current chunk/vector snapshot, source-authority registry, promoted runtime contracts, and JD-aware role-context evidence all reconcile from live repo state. The important product correction is now backed by receipts: titles are not being treated in isolation when role context is available; the engine has a JD-aware evidence layer that joins title, job context, company reference, responsibility snippets, tools, source refs, and explicit gaps.

01 · Receipts

Receipts Regenerated

ReceiptStatusWhy It Matters
source_authority_start_here_v1passConfirms clean-before-embed stance and source-authority prerequisites.
production_chunk_vector_reconciliation_v1passReconciles base chunks, repaired overlays, vector rows, and stale-vector state.
source_authority_registry_v2passClassifies all chunk families and keeps labels as metadata, not truth.
production_runtime_contracts_v1passPromotes product-facing contracts above raw warehouse tables.
jd_aware_role_context_evidence_v1passBuilds JD-aware role-context evidence and 50 real-row E2E fixtures.
production_embedding_semantic_qa_v1__source_family=jd_aware_role_contextpassSpot-checks 50 JD-aware chunks before any embedding scale-up.
02 · Current Snapshot

Key Counts

Corpus294,671 base chunks
27,844 repaired chunks
322,515 combined chunks
Vectors6,506 Gemini vectors
0 stale vectors
316,009 unvectorized chunks
Authority35 registry rows
44,440 clean candidates
15,104 trusted jobs-research titles
Runtime1,004 payload contracts
1,004 role-context rows
1,004 resolution decisions
The LinkedIn jobs source count is verified at 129,165, which matters because the JD-aware fixtures trace back to real job rows rather than title strings alone.
03 · Product Correction

JD-Aware Role Context Proof

The JD-aware receipt is the direct answer to the title-only trap. It proves 50 E2E fixtures trace to real linkedin_jobs rows, top 500 and top 1,000 titles have role context or explicit gaps, and source rows were not mutated.

MeasureCount
Top 500 with role context485
Top 500 explicit gaps14
Top 1,000 with role context946
Top 1,000 explicit gaps50
Rows with job context970
Rows with JD summaries954
Rows with responsibility snippets954
Rows with tool mentions715

Embedding eligibility is intentionally conservative: 660 reference-only, 294 progressive-only, 34 blocked, and 16 repair-first.

04 · Contracts

Runtime Contract Proof

The runtime contract receipt confirms product consumers read promoted contracts, not raw warehouse tables. Runtime payload contracts, route contracts, role-context evidence, and role-resolution decisions are present; ambiguous roles can abstain; assessment seed remains an onboarding seed only; and runtime boundaries stay local-only.

05 · Verification

Commands And Results

uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-start-here
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-contracts
uv run aina-data-engine --root /srv/aina/aina-data-engine-room jd-aware-role-context-evidence --fixture-limit 50
uv run pytest tests/test_chunk_vector_reconciliation.py tests/test_source_authority_registry_v2.py tests/test_source_authority_start_here.py tests/test_production_runtime_contracts.py tests/test_runtime_source_authority_repair.py tests/test_jd_aware_role_context.py -q
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate

All receipt commands passed. JD-aware semantic QA sampled 50 rows with 50 passing, 0 failing, 0 raw-JD key hits, and 0 legacy review-gate hits. Focused pytest reported 13 passed in 1.18s. Validate passed.

06 · Next Work

What Remains

This checkpoint does not finish the full Personalization Engine production goal. Next, use the JD-aware evidence and runtime contracts to harden AI Fluency maps for the real-row fixture set, semantically inspect 50 real rows across risk categories, repair the blocked and repair-first rows, and only then move clean chunks into the next Gemini embedding ladder.

Where To Start Next

Start by semantically spot-checking the 50 JD-aware fixtures and top-band gap rows; that is the bridge between structural proof and user-facing quality.