AINA Data Engine Room · Handoff · 2026-06-11

Harvest Source + Semantic Gate Handoff

A restart surface for turning scattered personalization assets into usable runtime data.

Ali Mehdi Mukadam · co-authored with Codex · branch ali/personalization-engine-mission-2026-06-09

The Single Idea

The data engine room now has an executable local harvest map and a semantic serving gate. The map turns repo archaeology into a structured source inventory, and the gate checks whether title rows make sense from runtime, learner, platform, and evaluator perspectives before packet hardening.

01 · What Changed

New Executable Lanes

The repo now has a source-harvest module, a semantic serving adapter, a semantic harvest gate, and focused tests. The CLI exposes these as harvest-source-map and semantic-harvest-gate.

AreaFiles
Source mapsrc/aina_data_engine/harvest_sources.py
Semantic servingsrc/aina_data_engine/semantic_serving.py
Gatesrc/aina_data_engine/semantic_harvest_gate.py
Teststests/test_semantic_harvest.py
CLIsrc/aina_data_engine/cli.py
02 · Live VDS Artifacts

Local Outputs

Generated outputs are local VDS artifacts and remain ignored by git under artifacts/. The code is the durable git surface; the artifact files are the current runtime evidence on disk.

/srv/aina/aina-data-engine-room/artifacts/validation/harvest_source_map_v1.json
/srv/aina/aina-data-engine-room/artifacts/validation/harvest_source_map_v1.jsonl
/srv/aina/aina-data-engine-room/artifacts/validation/semantic_harvest_gate_v1.json
/srv/aina/aina-data-engine-room/artifacts/validation/semantic_harvest_gate_v1.jsonl
/srv/aina/aina-data-engine-room/artifacts/provenance.jsonl
03 · Harvest Result

Sixteen Roots Found

The source map found 16 local source roots. All priority-1 roots were present: the current engine room, ALIPE, AINA core, Evidence Atlas, Hugging Face AINA, Jobs Research, and the old personalization engine semantic donor.

ALIPE + archives HF + Evidence Atlas Harvest source map 16 local roots Semantic gate 1,000 title rows
One source gap remains visible: aina-jobs-research is missing project-summary-package/exports/source_intelligence_v1/responsibilities.jsonl. Treat this as a repair/import target, not as a reason to ignore the rest of the harvest.
04 · Semantic Result

Reality Check Across 1,000 Rows

MetricCount
Sampled rows1,000
Source-backed rows965
Role-native rows309
Use for packet hardening247
Use with caveat485
Semantic repair or review268
TitleActionMeaning
support associate - somause_for_packet_hardeningSource-backed and customer-success native.
seasonal sales associateuse_for_packet_hardeningSource-backed and sales-native.
sales manageruse_for_packet_hardeningObvious title/function fit counts before full workflow hydration.
family law attorneysemantic_repair_or_reviewSensitive domain with missing source refs.
05 · Validation

Green Checks

The final checks were run after the semantic scoring and display-title refinements.

cd /srv/aina/aina-data-engine-room
.venv/bin/python -m ruff check src tests
.venv/bin/python -m pytest -q

Result: All checks passed. and 190 passed.

06 · Next Slices

Where to Continue

  1. Mine or repair the missing aina-jobs-research responsibilities export.
  2. Run deterministic repair over the 268 semantic_repair_or_review rows before using multi-LLM review.
  3. Turn the source map into import recipes for Evidence Atlas, Hugging Face runtime SQLite, ALIPE chunks, and old personalization semantic donors.
  4. Promote the 247 packet-hardening candidates, especially support, sales, retail, customer service, operations, and administration roles.
  5. Expand from 1,000 sampled rows to the full serviceable title universe once repair output exists.
cd /srv/aina/aina-data-engine-room
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room harvest-source-map
.venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-harvest-gate --sample-limit 1000
Where to start

Start with the 268 repair rows, because that is the shortest path from broad title coverage to runtime confidence.