AINA data engine room - founder report - 2026-06-11

Founder Report

The local data engine is now a real foundation for personalization, with broader title coverage, packet evidence, and proof artifacts.

The Single Idea

The engine can now recognize 110,184 local occupations and attach evidence packs to 14,114 eligible roles. It is not production yet, but it is no longer just a title map.

01 - What Changed

From Title Coverage Toward Runtime Intelligence

Before	Now
About 74k title rows.	110,184 local occupations.
Evidence packs reached a narrow slice.	14,114 eligible occupations now have evidence packs.
Some technically passing answers sounded wrong.	Warehouse/shipping now resolves to `warehouse manager`, not `warehouse i`.
Evidence matching was mostly exact title or small SOC fallback.	It now has exact title, explicit SOC, title-derived SOC, and guarded SOC-family paths.
Excluded roles could inherit evidence.	Excluded roles now skip evidence packs.

02 - Why It Matters

This Is Becoming The Personalization Engine

A learner can give a job title, the engine can resolve it, and for a growing set of roles it can attach grounded guidance about where AI helps, where AI should not be trusted, and which workflows matter.

The important improvement is honesty: weak matches are labeled, excluded roles stay excluded, and missing evidence stays visible.

03 - What To Review

Use The Human Surfaces

Artifact	What to look for
`docs/handoff/2026-06-11-session-closeout-data-engine-room-handoff.md`	Full technical handoff and repo map.
`artifacts/reports/evidence_fanout_probe_v1.md`	Evidence coverage measurement.
`artifacts/reports/serving_probe_v1.md`	Serving behavior on exact, messy, and OOD examples.
`docs/handoff/2026-06-11-title-expansion-runtime-semantic-replay-handoff.md`	Cumulative handoff for this lane.

You do not need to review raw JSON, parquet files, or Python internals unless you want to inspect the machinery.

04 - Gemini Embeddings

Worth Testing, Not A Runtime Shortcut

Gemini embeddings are worth looking at next. They could help compare job titles, responsibilities, workflows, and evidence packs semantically instead of relying only on deterministic token overlap.

The next move should be a sidecar evaluation, not a direct runtime switch. The question is whether embeddings find better matches without creating confident nonsense.

05 - Pending Work

What Is Still Open

Pending area	Plain-English meaning
More evidence coverage	Most eligible titles still do not have promoted evidence packs.
Embedding evaluation	Gemini should be tested against the deterministic baseline.
More semantic replay	Only the first runtime semantic batch has been processed in this lane.
Quarantine decision	`responsibility_registry_v2` remains off-limits until explicitly lifted.
Runtime packaging	This is still a local repo engine, not a product API/UI.

06 - Next Milestone

Run A Gemini Embeddings Evaluation Lane

Embed title/evidence/workflow text, compare embedding matches against evidence_fanout_probe_v1, inspect 50 to 100 real examples, and only promote embedding behavior if it beats the deterministic baseline.

Founder takeaway

The engine is useful enough to evaluate seriously, but not ready to connect to real learners without the next evidence-quality pass.

Ali Mehdi Mukadam - co-authored with Codex - 2026-06-11

topics:
  - founder-report
  - aina-data-engine-room
  - personalization-engine
  - evidence-coverage
subtopics:
  - gemini-embeddings
  - local-vds-run
  - title-coverage
  - affordance-packs

founder-reportpersonalization-engineevidence-coveragegemini-embeddings