AINA Data Engine Room handoff · 2026-06-13 · branch ali/ain-506-p0-gate-2026-06-12

Source Authority And Runtime Reconciliation Checkpoint

The production spine now reconciles chunk/vector authority, source registry, runtime contracts, and JD-aware role context from live repo state.

The Single Idea

This checkpoint proves the next serial production-spine slice after the terminology cleanup: the current chunk/vector snapshot, source-authority registry, promoted runtime contracts, and JD-aware role-context evidence all reconcile from live repo state. The important product correction is now backed by receipts: titles are not being treated in isolation when role context is available; the engine has a JD-aware evidence layer that joins title, job context, company reference, responsibility snippets, tools, source refs, and explicit gaps.

01 · Receipts

Receipts Regenerated

Receipt	Status	Why It Matters
`source_authority_start_here_v1`	pass	Confirms clean-before-embed stance and source-authority prerequisites.
`production_chunk_vector_reconciliation_v1`	pass	Reconciles base chunks, repaired overlays, vector rows, and stale-vector state.
`source_authority_registry_v2`	pass	Classifies all chunk families and keeps labels as metadata, not truth.
`production_runtime_contracts_v1`	pass	Promotes product-facing contracts above raw warehouse tables.
`jd_aware_role_context_evidence_v1`	pass	Builds JD-aware role-context evidence and 50 real-row E2E fixtures.
`production_embedding_semantic_qa_v1__source_family=jd_aware_role_context`	pass	Spot-checks 50 JD-aware chunks before any embedding scale-up.

02 · Current Snapshot

Key Counts

Corpus294,671 base chunks
27,844 repaired chunks
322,515 combined chunks

Vectors6,506 Gemini vectors
0 stale vectors
316,009 unvectorized chunks

Authority35 registry rows
44,440 clean candidates
15,104 trusted jobs-research titles

Runtime1,004 payload contracts
1,004 role-context rows
1,004 resolution decisions

The LinkedIn jobs source count is verified at 129,165, which matters because the JD-aware fixtures trace back to real job rows rather than title strings alone.

03 · Product Correction

JD-Aware Role Context Proof

The JD-aware receipt is the direct answer to the title-only trap. It proves 50 E2E fixtures trace to real linkedin_jobs rows, top 500 and top 1,000 titles have role context or explicit gaps, and source rows were not mutated.

Measure	Count
Top 500 with role context	485
Top 500 explicit gaps	14
Top 1,000 with role context	946
Top 1,000 explicit gaps	50
Rows with job context	970
Rows with JD summaries	954
Rows with responsibility snippets	954
Rows with tool mentions	715

Embedding eligibility is intentionally conservative: 660 reference-only, 294 progressive-only, 34 blocked, and 16 repair-first.

04 · Contracts

Runtime Contract Proof

The runtime contract receipt confirms product consumers read promoted contracts, not raw warehouse tables. Runtime payload contracts, route contracts, role-context evidence, and role-resolution decisions are present; ambiguous roles can abstain; assessment seed remains an onboarding seed only; and runtime boundaries stay local-only.

05 · Verification

Commands And Results

uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-start-here
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-contracts
uv run aina-data-engine --root /srv/aina/aina-data-engine-room jd-aware-role-context-evidence --fixture-limit 50
uv run pytest tests/test_chunk_vector_reconciliation.py tests/test_source_authority_registry_v2.py tests/test_source_authority_start_here.py tests/test_production_runtime_contracts.py tests/test_runtime_source_authority_repair.py tests/test_jd_aware_role_context.py -q
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate

All receipt commands passed. JD-aware semantic QA sampled 50 rows with 50 passing, 0 failing, 0 raw-JD key hits, and 0 legacy review-gate hits. Focused pytest reported 13 passed in 1.18s. Validate passed.

06 · Next Work

What Remains

This checkpoint does not finish the full Personalization Engine production goal. Next, use the JD-aware evidence and runtime contracts to harden AI Fluency maps for the real-row fixture set, semantically inspect 50 real rows across risk categories, repair the blocked and repair-first rows, and only then move clean chunks into the next Gemini embedding ladder.

Where To Start Next

Start by semantically spot-checking the 50 JD-aware fixtures and top-band gap rows; that is the bridge between structural proof and user-facing quality.

Ali Mehdi Mukadam · co-authored with Codex · 2026-06-13

topics:
  - personalization-engine
  - data-engine-room
  - production-readiness
subtopics:
  - source-authority
  - runtime-contracts
  - jd-aware-role-context
  - clean-before-embed
  - ai-fluency

personalization-engine source-authority runtime-contracts jd-aware-role-context clean-before-embed ai-fluency