AINA Data Engine Room · Local checkpoint · 2026-06-13

AI Fluency Raw JD Proof-Tail Repair Handoff

A source-authority repair pass that uses real LinkedIn job descriptions without turning raw rows into runtime or embedding authority.

Ali Mehdi Mukadam · co-authored with Codex · 2026-06-13

The Single Idea

This checkpoint applies Ali's correction: proof-tail titles should not be repaired in isolation when the source job row already carries JD, company, industry, function, SOC, risk tags, and source refs. The top-500 proof proxy band is now complete, the top-1,000 band is 998/1,000, and the two remaining rows stay blocked because their source context is still too broad.

01 · What changed

Raw rows became evidence, not authority by default

The repair code now loads linkedin_jobs read-only from local DuckDB and applies a narrow raw-JD authority path. A row must have real source support, enough matching job rows, and an allowed authority kind before it can move from blocked to repaired.

Exact Strong title and function context can repair.
Subject Teacher-subject rows can repair when education context is clear.
Context only Evidence is preserved, but authority stays blocked.
Unknown kind Future typos cannot promote rows.

The code also records a SOC disagreement flag when a rule SOC and dominant source SOC differ. That keeps single-table LinkedIn evidence visible as evidence with caveats, not quiet truth.

02 · Current proof

The proof-tail is now 18 ready and 2 blocked

MetricValue
Input blocked rows20
Cumulative ready repairs18
Raw LinkedIn JD ready repairs7
Still blocked rows2
Top-500 local proof proxies500 / 500
Top-1,000 local proof proxies998 / 1,000
Embedding candidates0
Batch candidates0
Production unlocks0

AIN-510 remains promotion_ready for local exact cosine with 6,506 valid vectors, complete top-500/top-1,000 vector coverage, and zero stale vectors. Runtime embedding authority is still not promoted.

03 · Raw JD repairs

Seven rows moved because source rows disambiguated them

TitleFunctionAuthority mode
teacher-scienceeducationlinkedin_jd_subject
court judicial assistant tellerlegal_compliancelinkedin_jd_exact
teacher-social studieseducationlinkedin_jd_subject
landscape designerdesign_creativelinkedin_jd_exact
teacher-computereducationlinkedin_jd_subject
AML SME - SIUlegal_compliancelinkedin_jd_exact
food and beverage lead auditorlegal_compliancelinkedin_jd_exact

court judicial assistant teller is now correctly interpreted as a Colorado Judicial Branch role in Teller County, not a bank teller role. It remains documented as single-table LinkedIn evidence with SOC caveats.

04 · Still blocked

The last two rows need stronger disambiguation

TitleReason
lab assistantReal JD evidence exists, but the role is mixed across clinical, research, materials, admin, and healthcare-adjacent contexts.
ecommerce managerReal JD evidence exists, but the role remains multifunctional across marketing, sales, category, operations, and marketplace work.

No new human_review fields or gates were introduced. Older generated receipts still contain legacy wording, but this slice used source and semantic quality gates only.

05 · Validation

The checkpoint passed tests, lint, gates, and full validation

CheckResult
Focused pytest8 passed
Ruffpass
AIN-506 gatepass
AIN-510 gatepromotion_ready
Production runtime readinessready_to_harden_headless_production_runtime
Full validationpass
Git diff whitespace checkpass

Claude CLI reviewed the diff read-only. Verdict: conditional pass, with no must-fix before local commit. The two implementable hardening notes were handled before final validation.

Linear writeback was attempted from this VDS session and returned auth_revoked / HTTP 401. The Markdown handoff contains the exact payload to post to AIN-510, AIN-520, and AIN-527 once auth is refreshed.

06 · Resume

Exact restart commands

Codex · Resume checkpoint · verify before continuing
cd /srv/aina/aina-data-engine-room
git status --short --branch
git log -5 --oneline
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
jq '{status, valid, metrics, checks, scope}' artifacts/validation/ai_fluency_proof_tail_authority_repair_v1.json
jq '{status, valid, metrics, checks}' artifacts/validation/ai_fluency_top_band_capability_coverage_v1.json
Watch-out: do not turn context-only raw rows into embedding or runtime authority.
Where to start

Close lab assistant and ecommerce manager only if stronger source context can disambiguate them; otherwise move back to the clean-before-embed production spine.