Founder Report
The local data engine is now a real foundation for personalization, with broader title coverage, packet evidence, and proof artifacts.
The engine can now recognize 110,184 local occupations and attach evidence packs to 14,114 eligible roles. It is not production yet, but it is no longer just a title map.
From Title Coverage Toward Runtime Intelligence
| Before | Now |
|---|---|
| About 74k title rows. | 110,184 local occupations. |
| Evidence packs reached a narrow slice. | 14,114 eligible occupations now have evidence packs. |
| Some technically passing answers sounded wrong. | Warehouse/shipping now resolves to warehouse manager, not warehouse i. |
| Evidence matching was mostly exact title or small SOC fallback. | It now has exact title, explicit SOC, title-derived SOC, and guarded SOC-family paths. |
| Excluded roles could inherit evidence. | Excluded roles now skip evidence packs. |
This Is Becoming The Personalization Engine
A learner can give a job title, the engine can resolve it, and for a growing set of roles it can attach grounded guidance about where AI helps, where AI should not be trusted, and which workflows matter.
Use The Human Surfaces
| Artifact | What to look for |
|---|---|
docs/handoff/2026-06-11-session-closeout-data-engine-room-handoff.md | Full technical handoff and repo map. |
artifacts/reports/evidence_fanout_probe_v1.md | Evidence coverage measurement. |
artifacts/reports/serving_probe_v1.md | Serving behavior on exact, messy, and OOD examples. |
docs/handoff/2026-06-11-title-expansion-runtime-semantic-replay-handoff.md | Cumulative handoff for this lane. |
You do not need to review raw JSON, parquet files, or Python internals unless you want to inspect the machinery.
Worth Testing, Not A Runtime Shortcut
Gemini embeddings are worth looking at next. They could help compare job titles, responsibilities, workflows, and evidence packs semantically instead of relying only on deterministic token overlap.
The next move should be a sidecar evaluation, not a direct runtime switch. The question is whether embeddings find better matches without creating confident nonsense.
What Is Still Open
| Pending area | Plain-English meaning |
|---|---|
| More evidence coverage | Most eligible titles still do not have promoted evidence packs. |
| Embedding evaluation | Gemini should be tested against the deterministic baseline. |
| More semantic replay | Only the first runtime semantic batch has been processed in this lane. |
| Quarantine decision | responsibility_registry_v2 remains off-limits until explicitly lifted. |
| Runtime packaging | This is still a local repo engine, not a product API/UI. |
Run A Gemini Embeddings Evaluation Lane
Embed title/evidence/workflow text, compare embedding matches against evidence_fanout_probe_v1, inspect 50 to 100 real examples, and only promote embedding behavior if it beats the deterministic baseline.
The engine is useful enough to evaluate seriously, but not ready to connect to real learners without the next evidence-quality pass.