Harvest Source + Semantic Gate Handoff
A restart surface for turning scattered personalization assets into usable runtime data.
The data engine room now has an executable local harvest map and a semantic serving gate. The map turns repo archaeology into a structured source inventory, and the gate checks whether title rows make sense from runtime, learner, platform, and evaluator perspectives before packet hardening.
New Executable Lanes
The repo now has a source-harvest module, a semantic serving adapter, a semantic harvest gate, and focused tests. The CLI exposes these as harvest-source-map and semantic-harvest-gate.
| Area | Files |
|---|---|
| Source map | src/aina_data_engine/harvest_sources.py |
| Semantic serving | src/aina_data_engine/semantic_serving.py |
| Gate | src/aina_data_engine/semantic_harvest_gate.py |
| Tests | tests/test_semantic_harvest.py |
| CLI | src/aina_data_engine/cli.py |
Local Outputs
Generated outputs are local VDS artifacts and remain ignored by git under artifacts/. The code is the durable git surface; the artifact files are the current runtime evidence on disk.
/srv/aina/aina-data-engine-room/artifacts/validation/harvest_source_map_v1.json /srv/aina/aina-data-engine-room/artifacts/validation/harvest_source_map_v1.jsonl /srv/aina/aina-data-engine-room/artifacts/validation/semantic_harvest_gate_v1.json /srv/aina/aina-data-engine-room/artifacts/validation/semantic_harvest_gate_v1.jsonl /srv/aina/aina-data-engine-room/artifacts/provenance.jsonl
Sixteen Roots Found
The source map found 16 local source roots. All priority-1 roots were present: the current engine room, ALIPE, AINA core, Evidence Atlas, Hugging Face AINA, Jobs Research, and the old personalization engine semantic donor.
aina-jobs-research is missing project-summary-package/exports/source_intelligence_v1/responsibilities.jsonl. Treat this as a repair/import target, not as a reason to ignore the rest of the harvest.Reality Check Across 1,000 Rows
| Metric | Count |
|---|---|
| Sampled rows | 1,000 |
| Source-backed rows | 965 |
| Role-native rows | 309 |
| Use for packet hardening | 247 |
| Use with caveat | 485 |
| Semantic repair or review | 268 |
| Title | Action | Meaning |
|---|---|---|
support associate - soma | use_for_packet_hardening | Source-backed and customer-success native. |
seasonal sales associate | use_for_packet_hardening | Source-backed and sales-native. |
sales manager | use_for_packet_hardening | Obvious title/function fit counts before full workflow hydration. |
family law attorney | semantic_repair_or_review | Sensitive domain with missing source refs. |
Green Checks
The final checks were run after the semantic scoring and display-title refinements.
cd /srv/aina/aina-data-engine-room .venv/bin/python -m ruff check src tests .venv/bin/python -m pytest -q
Result: All checks passed. and 190 passed.
Where to Continue
- Mine or repair the missing
aina-jobs-researchresponsibilities export. - Run deterministic repair over the 268
semantic_repair_or_reviewrows before using multi-LLM review. - Turn the source map into import recipes for Evidence Atlas, Hugging Face runtime SQLite, ALIPE chunks, and old personalization semantic donors.
- Promote the 247 packet-hardening candidates, especially support, sales, retail, customer service, operations, and administration roles.
- Expand from 1,000 sampled rows to the full serviceable title universe once repair output exists.
cd /srv/aina/aina-data-engine-room .venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room harvest-source-map .venv/bin/aina-data-engine --root /srv/aina/aina-data-engine-room semantic-harvest-gate --sample-limit 1000
Start with the 268 repair rows, because that is the shortest path from broad title coverage to runtime confidence.