Draft planning artifact - AINA data engine room - 2026-06-09

AINA Personalization Engine Mission, Milestones, Slices, and Tasks

A source-bound synthesis of the personalization engine documents into a buildable product and data-engine mission.

Ali Mehdi Mukadam - co-authored with Codex - reading time 18 minutes

The Single Idea

AINA should be built as a personalized AI capability transformation engine: it turns a learner's real role, workflows, readiness, goals, capacity, constraints, and evidence into a structured learning path, realistic practice, evaluated progression, and a durable record of what changed.

This is not a CSV-to-prompt app and not a generic curriculum generator. The source documents converge on one architecture: a data factory builds a trusted work-intelligence map; the personalization engine uses that map to choose the next best learning move; the evaluator checks proof of capability; learner events improve the route.

00 - Advisory recall

Wiki recall: AINA personalization engine

Past attempts. None are canonical. (I checked: Wiki.)

What I found

1. [previous try] AINA Content Library - Wiki - updated 2026-05-23

This page is useful as a reminder that authored curriculum exists on disk, but it is not the same thing as runtime-generated personalization. It warns against confusing inventory with capability: existing lessons can seed the engine, but the engine still needs contracts, eligibility rules, packet generation, and evaluation surfaces.

-> /wiki/topics/aina-content-library.html

2. [previous try] Onboarding & Assessment - Wiki - updated 2026-05-23

This page supports the same learner loop found in the uploaded docs: intake must capture structured role, readiness, current AI use, goals, pain points, and constraints before the system can personalize. What remains useful is the first-mile emphasis: AINA does not begin with content selection; it begins with learner state capture.

-> /wiki/topics/onboarding-and-assessment.html

3. [previous try] AINA Risks & Blockers - Wiki - updated 2026-05-23

This page reinforces the need for evidence-bound claims, data integrity checks, test coverage, and clear risk handling. For this task, the still-relevant warning is that exposure scores, automation claims, and public-source joins must stay provenance-backed and reviewable.

-> /wiki/topics/aina-risks-and-blockers.html

What seems still relevant

The wiki hits line up with the uploaded source docs, but they are advisory. The current repo has progressed since those wiki snapshots: the source foundation docs are mirrored locally, public Hugging Face datasets are registered in the source registry, and the local engine already has schemas, runtime, packet, scoring, provenance, and validation modules. The major context change is that the engine-room now needs to move from "source foundation and local packet factory" toward real dataset ingestion, role/workflow contract hardening, and evaluator-backed runtime proof.

My recommendation for THIS task

Use the uploaded documents as the product source of truth, use the current repo as implementation truth, and turn the synthesis into a milestone plan that starts with data/source contracts before UI. The first production-quality range should prove this vertical path: assessment input -> canonical role/workflow packet -> CurriculumInputPacket -> personalized path -> practice/evaluation event -> quality report.

-> Ready to build, or do you want to redirect?

01 - Source basis

Source Basis

The source documents agree on the core idea but describe it at different maturity levels. I treated them as founder/product source truth, with this current repo as live implementation truth.

SourceUsed forNotes
AI Native Academy - personalization engine operating system.mdProduct loop, CurriculumInputPacket, runtime sequence, learner events, MVP wedgeStrongest operating-system document.
Learning Graph June 2026.mdAINA learning ladder, work intelligence graph, assessment fields, learner loopStrongest learning model and graph document.
Personalization Engine - 21 May 2025.mdData/scoring engine framing, dataset roles, build sequence, risk languageStrongest data-engine sequencing document.
Personalized AI Education Platform_ The Personalization Engine.mdHF ingestion, API endpoints, HITL matrix, sandbox shapeStrongest implementation prompt document.
Architecting a Personalized AI Education Platform_ From Raw Data to Runtime Engine (2).mdDuckDB/HF/GDPval architecture, sandbox/evaluator rationale, deployment splitStrongest raw-data-to-runtime architecture document.
AINative Academy - Algorithms, Sauce & Structure - Details.mdPersonalization examples and dynamic planner logicDuplicate of ANA Sauce .md; counted once.
Current repo truth

Hugging Face now runs as bounded VDS engine input, and O*NET 30.3 now backs the local canonical occupation snapshot. Selected Economic Index CSVs, legacy Economic Index wage/employment/task signal files, GDPval train parquet, and O*NET occupation/task files are downloaded or cached, fingerprinted, processed, and mapped into runtime packets.

Remaining boundary

Bulk GDPval reference/deliverable folders remain intentionally excluded. GDPval file-availability blockers are closed by selecting reference-backed tasks, while the remaining large-rubric source task has a reviewer-ready calibration packet, a recorded keep-held decision with replacement required, a simplified replacement path, a reviewer-banned-term gate with 0 stale terms, a Claude reviewer payload that approves that replacement for local internal synthetic practice with 0 required edits, a domain-review receipt that applies that payload as domain_review_approved_internal_only, a deterministic replacement-practice receipt that passes the approved local fixture, a replacement-replay bridge that keeps a five-case synthetic replay/feedback matrix in a separate review lane with 0 canonical event-log writes, an approved-intake receipt that inventories only the reviewed chain while creating 0 new approvals, a candidate-pack receipt that selects five additional HF-backed candidates for reviewer/domain review while creating 0 approvals, a candidate-backlog receipt that inventories 64 additional HF-backed, reference-backed, deliverable-backed candidates for reviewer/domain evidence only while creating 0 approvals or unlocks, a candidate-review-batch receipt that selects 10 of those candidates into backlog_batch_001, a candidate-batch-review receipt that records Claude decisions as 3 future-gate-only approvals, 5 revision requests, and 2 keep-held decisions with 0 practice/intake/runtime unlocks, a candidate-batch-outcome receipt that routes those decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates while carrying forward 27 open edits and 0 practice/intake/runtime unlocks, a candidate-batch-resolution receipt that closes 12/27 locally provable edits, admits only 2 fully closed redrafts to local-only practice, leaves 3 confirmation candidates and 3 redraft candidates pending source-file review, keeps 2 held candidates out, and creates 0 intake/runtime unlocks, a candidate-batch-practice receipt that runs those 2 closed redrafts with 2/2 deterministic evaluator passes while preserving 6 source-file-review blockers, 2 held candidates, 15 open edits, and 0 approved-intake/runtime unlocks, a candidate-review receipt that approves two only for local practice while requesting revision on three, a candidate-revision-closeout receipt that closes all three revisions and 6/6 required edits, and a candidate-practice receipt that proves all five reviewed/closeout-approved candidates pass deterministic local practice/replay/intake inventory with 0 remaining revision holds. Production approval now keeps public runtime, external writes, real-user data, production telemetry, and deployment promotion blocked unless explicit approval evidence exists. BLS OEWS wage/employment fact import is cached-only: official rows load only when oe.data.0.Current is present or another official access path succeeds, and the live VDS currently records 0 fact rows.

Current ICP title-coverage status: artifacts/wedge_report.csv contains 74,225 title rows. artifacts/semantic_review/semantic_review_summary.json proves 45,564 generated title packets were reviewed, 45,564 passed deterministic semantic consistency, and 0 failed. artifacts/semantic_review/multi_llm_review.md records a Claude CLI adversarial PASS_WITH_NOTES verdict: safe to land locally, while still noting that semantic consistency is not deep packet-quality proof for every role. The next milestone is a title-coverage engine that classifies ICP titles as serve_now, serve_with_fallback, multi_llm_adjudication_required, or exclude_or_not_icp.

Evidence from the repo: pyproject.toml includes datasets and huggingface-hub; src/aina_data_engine/sources.py registers Anthropic/EconomicIndex and openai/gdpval; src/aina_data_engine/public_source_snapshots.py parses 1,016 O*NET occupation rows and 18,796 task rows, writes O*NET parquet plus canonical_occupation_snapshot.parquet, supports cached-only BLS OEWS oe.data.0.Current wage/employment fact import, and records the live VDS BLS access/cache gap with 0 wage/employment rows instead of inventing rows; src/aina_data_engine/huggingface_ingest.py downloads the selected byte-capped files and writes derived maps; artifacts/derived/huggingface/hf_ingest_summary.json records 16 processed files including the local manifest, 186,743,670 bytes, 756 Economic Index SOC rows, 821 legacy wage/employment/task signal rows, 220 GDPval tasks, 907 mapped SOC entries, 821 SOC entries with legacy signals, and 44 GDPval-linked SOC groups; src/aina_data_engine/huggingface_mapping.py writes hf_runtime_map_receipt_v1, which verifies 15 selected HF files on disk, checks all 45,564 packets, and proves 43,936 packets with HF refs, 7,906 packets with GDPval refs, 126,588 workflows with HF refs, 821 SOC entries with legacy signals, 4 curriculum modules, 4 exercise links, 4 rubric links, 17 curriculum HF refs, 11 curriculum GDPval refs, 220 sandbox scenarios, and 12 review-dashboard source-backed items; src/aina_data_engine/source_authority.py writes source_authority_beta_wedge_audit_v1, which classifies seven source layers, verifies 15 selected HF files, includes 821 legacy Economic Index signal rows, includes public-source snapshot metrics, audits 11 representative beta families, and keeps HF research/benchmark refs separate from job-market signal provenance; src/aina_data_engine/beta_admission.py writes beta_admission_v1, which approves 5 families for internal synthetic beta, holds 6 as review-required, supports blocked status, discloses the BLS gap, and blocks public release, external writes, and real-user data; src/aina_data_engine/quality.py writes rubric_depth_gate_v1, which checks 48 modules across 12 families for actionable rubric criteria, source refs, locally resolved refs, HF context, GDPval enrichment, and 1 GDPval review hold; src/aina_data_engine/deployment_readiness.py writes deployment_readiness_v1, which routes the remaining held module to calibration and proves no public runtime, no external writes, no real-user data, external beta blocked, and public release blocked; src/aina_data_engine/gdpval_hold_closeout.py writes gdpval_hold_closeout_v1, which proves 9 selected GDPval tasks have reference and deliverable evidence, 0 file-availability blockers remain, and 1 calibration hold remains; src/aina_data_engine/gdpval_calibration_packet.py writes gdpval_calibration_packet_v1, which maps that remaining hold to 15 reference file URIs, 2 deliverable example URIs, 67 raw rubric criteria, 121 points, redacted sample criteria, reviewer actions, and a blocked external/public/unattended boundary; src/aina_data_engine/gdpval_calibration_decision.py writes gdpval_calibration_decision_v1, which records the original task decision as keep-held with replacement required and 0 external/public/unattended unblocks; src/aina_data_engine/gdpval_replacement_closeout.py writes gdpval_replacement_closeout_v1, which records a rewritten simplified replacement traced to processed HF GDPval data and proves 0 reviewer-banned terms; src/aina_data_engine/gdpval_replacement_reviewer_payload.py writes gdpval_replacement_reviewer_payload_v1, which records reviewer payload 3b6a4073d9cfa3b9 as approve_internal with 0 required edits; src/aina_data_engine/gdpval_replacement_domain_review.py writes gdpval_replacement_domain_review_v1, which applies that payload as domain_review_approved_internal_only; src/aina_data_engine/gdpval_replacement_practice.py writes gdpval_replacement_practice_v1, which exercises the approved replacement as local internal synthetic practice and records a deterministic evaluator pass; src/aina_data_engine/gdpval_replacement_replay_bridge.py writes gdpval_replacement_replay_bridge_v1, which converts that practice receipt into 5 separate synthetic replay/feedback-preview cases with 0 canonical event-log writes; src/aina_data_engine/gdpval_replacement_approved_intake.py writes gdpval_replacement_approved_intake_v1, which inventories only the reviewer/domain-approved chain and creates 0 new approvals; src/aina_data_engine/gdpval_replacement_candidate_pack.py writes gdpval_replacement_candidate_pack_v1, which selects 5 additional review-only candidates from 69 eligible HF-backed tasks while creating 0 approvals and 0 unlocks; src/aina_data_engine/gdpval_replacement_candidate_review.py writes gdpval_replacement_candidate_review_v1, which reviews all 5 selected candidates, advances 2 only toward local practice, requests revision on 3, records 6 required edits, and creates 0 approved-intake/runtime unlocks; src/aina_data_engine/gdpval_replacement_candidate_revision_closeout.py writes gdpval_replacement_candidate_revision_closeout_v1, which closes those 3 revisions and 6/6 required edits against the processed HF GDPval task map and downloaded GDPval parquet; src/aina_data_engine/gdpval_replacement_candidate_practice.py writes gdpval_replacement_candidate_practice_v1, which proves all 5 reviewed/closeout-approved candidates pass deterministic local practice, 25 replay/feedback previews, and approved-intake inventory while leaving 0 revision holds and creating 0 runtime unlocks; and src/aina_data_engine/production_deployment_approval.py writes production_deployment_approval_v1, which keeps 5 approval domains blocked with 0 approved unlocks.

src/aina_data_engine/gdpval_replacement_candidate_batch_outcome.py now writes gdpval_replacement_candidate_batch_outcome_v1, which routes the validated backlog_batch_001 decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates while carrying forward 27 open edits, 220 processed GDPval task-map rows, 186,743,670 downloaded HF bytes, and 0 practice/intake/runtime/external/production unlocks.

src/aina_data_engine/gdpval_replacement_candidate_batch_resolution.py now writes gdpval_replacement_candidate_batch_resolution_v1, which resolves that routed batch without pretending source-file contents were inspected: 10 candidates resolved, 12/27 required edits closed, 15 left open, 2 fully closed redrafts admitted to the batch-practice gate, 3 confirmation candidates plus 3 redraft candidates held for source-file review, 2 held candidates kept out of practice, and 0 practice executions inside the resolution gate itself, approved-intake-ready candidates, runtime approvals, external/public/unattended unlocks, or production unlocks.

src/aina_data_engine/gdpval_replacement_candidate_batch_practice.py now writes gdpval_replacement_candidate_batch_practice_v1, which practices only those two fully closed backlog_batch_001 redrafts on top of the processed HF/GDPval evidence: 2 practice candidates, 2 practice items, 2/2 deterministic evaluator passes, 6 source-file-review blockers preserved, 2 held candidates preserved, 15 open edits preserved, 220 processed GDPval task-map rows, 186,743,670 downloaded HF bytes, and 0 approved-intake/runtime/external/production unlocks.

HF and public-source status: Hugging Face is now real bounded engine input, and O*NET 30.3 is now the local canonical occupation snapshot. The runtime maps Anthropic Economic Index tasks, legacy Economic Index wage/employment/task signals, and OpenAI GDPval scenarios onto role/workflow packets, then validates those links through the HF runtime map receipt, source-authority beta-wedge audit, beta admission policy, packet quality, sandbox, API runtime, beta UI shell, review dashboard, content coverage, rubric-depth gate, deployment-readiness gate, GDPval hold-closeout gate, GDPval calibration-packet gate, GDPval calibration-decision gate, GDPval replacement-closeout gate, GDPval replacement-reviewer-payload gate, GDPval replacement-domain-review gate, GDPval replacement-practice gate, GDPval replacement-replay-bridge gate, GDPval replacement-approved-intake gate, GDPval replacement-candidate-pack gate, GDPval replacement-candidate-backlog gate, GDPval replacement-candidate-review-batch gate, GDPval replacement-candidate-batch-review gate, GDPval replacement-candidate-batch-outcome gate, GDPval replacement-candidate-batch-resolution gate, GDPval replacement-candidate-batch-practice gate, GDPval replacement-candidate-review gate, GDPval replacement-candidate-revision-closeout gate, GDPval replacement-candidate-practice gate, production-deployment-approval gate, and full validation artifacts. BLS OEWS wage/employment fact import is cached-only and currently records 0 fact rows on the VDS; GDPval file-availability blockers are closed, the remaining large-rubric hold is packaged, recorded as keep-held with replacement required, and now has a simplified authored internal-practice replacement path, an approve_internal reviewer payload, an applied internal-only domain-review receipt, a deterministic practice receipt traced to the processed HF GDPval row and downloaded GDPval parquet, a separate five-case synthetic replay/feedback-preview bridge with 0 canonical learner-history writes, an inventory-only approved-intake receipt with 0 new approvals, a candidate-pack receipt that selects five additional review-only candidates with 0 approvals or unlocks, a candidate-backlog receipt that queues 64 additional review-only candidates with 0 approvals or unlocks, a candidate-review-batch receipt that selects 10 candidates into backlog_batch_001, a candidate-batch-review receipt that records Claude decisions as 3 future-gate-only approvals, 5 revision requests, and 2 keep-held decisions with 0 practice/intake/runtime unlocks, a candidate-batch-outcome receipt that routes those decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates with 27 open edits and 0 practice/intake/runtime unlocks, a candidate-batch-resolution receipt that admits only 2 fully closed redrafts to local-only practice while preserving 6 source-file-review blockers and 2 held candidates, a candidate-batch-practice receipt that records those 2 practice items with 2/2 evaluator passes and 0 approved-intake/runtime unlocks, a candidate-review receipt that approves two only for local practice while requesting revision on three, a candidate-revision-closeout receipt that closes all three revision candidates and 6/6 required edits, and a candidate-practice receipt that keeps all five reviewed/closeout-approved candidates in local-only deterministic practice/replay/intake inventory. Production approval still keeps all five deployment domains blocked.
02 - Mission

Mission

AINA's mission is to help knowledge workers become AI-native by learning against their actual work, not generic examples.

Build the data and runtime engine that maps a person's role, workflows, readiness, goals, capacity, and proof artifacts into the next best AI capability path, then evaluates whether the person can actually perform the work with AI.

The decision matrix is the product moat:

Role + Workflow + Readiness + Goal + Capacity + Evidence

That matrix should drive every learner-facing decision: what work problem the person is trying to improve, what workflows are recurring, what AI can safely automate or augment, what remains human-owned, what capability level the learner is ready for, what artifact proves progress, and what the engine should do next when evidence is missing.

Data Factory creates the map Personalization drives the car Evaluator checks proof Events improve
The operating doctrine from the source docs: data builds the map, runtime chooses the route, evaluation proves capability, events improve future routing.

Non-goals

03 - Product reality

Product Reality

The intended learner loop is assessment -> readiness snapshot -> preview lesson -> practice -> evaluation -> upgrade or personalized path.

Step 1

Learner gives role, industry, goals, current AI use, tool stack, constraints, confidence, and painful workflows.

Step 2

The engine normalizes the role into family, function, seniority, workflow cluster, and fallback tier.

Step 3

The planner builds a packet with tasks, AI affordances, risks, candidate modules, exercises, and rubrics.

Step 4

The learner practices, submits evidence, receives evaluation, and future recommendations adjust.

The intended engine loop is profile hardening -> role mapping -> packet retrieval -> exposure scoring -> candidate selection -> sequencing -> deterministic hydration -> validation -> persistence -> event tracking.

04 - Contracts

Canonical Contracts

These contracts should become the core product/data boundary. They are intentionally small enough to ship in slices but strong enough to prevent prompt soup.

ContractPurposeMinimum fields
LearnerProfileCaptures who the learner is and what they need next.Role title, industry, seniority, goals, AI usage, confidence, constraints, capacity, tool stack, evidence history.
RoleProfileNormalizes a title into an engine-readable role.Canonical title, role family, function, seniority, SOC/O*NET mapping, aliases, confidence, fallback tier.
WorkflowProfileDescribes recurring work the learner actually performs.Workflow cluster, tasks, pain points, frequency, importance, deliverables, human accountability.
AIAffordancePackStates where AI helps and where it should not be trusted.AI grants, AI blocks, risk level, HITL checkpoints, tool patterns, failure modes.
CurriculumInputPacketThe packet the planner consumes.Learner profile, role profile, top workflows, AI opportunities, boundaries, candidate modules, coverage requirements, risk rules, fallback path, quality flags.
PersonalizationDecisionExplains the path chosen by the engine.Chosen path, candidate scores, excluded options, rationale, confidence, required events.
LearnerEventRecords what happened so the loop can improve.Event type, actor, learner id, packet id, module id, artifact refs, scores, timestamps.
SandboxScenarioTurns real work into practice.Task id, prompt, reference files, expected deliverable, rubric, allowed tools, HITL checkpoints.
EvaluationResultTurns practice into progression evidence.Rubric scores, passed gates, failed gates, feedback, next recommendation, evaluator provenance.

Recommended event vocabulary: assessment_started, assessment_completed, readiness_report_viewed, curriculum_plan_created, module_exposed, lesson_started, practice_submitted, evaluator_completed, mastery_gate_passed, mastery_gate_failed, confidence_updated, goal_updated, tool_stack_updated.

05 - Milestones

Milestones

MilestoneMissionExit proof
M0. Mission and source lockConvert the uploaded docs into one product/data charter and identify current repo gaps.This artifact exists as markdown + HTML, with source docs listed and HF current status explicit.
M1. Canonical source warehouse v1Move from source registry to reproducible, validated dataset snapshots.O*NET/BLS/local docs plus HF EconomicIndex/GDPval snapshots have manifests, licenses, schema checks, and provenance.
M2. Work Intelligence Graph v1Convert role/task/workflow/source data into reusable packets.RoleProfile, WorkflowProfile, AIAffordancePack, and fallback mappings exist for beta role families.
M3. CurriculumInputPacket and planner v0Generate deterministic packets and choose next best learning paths.Assessment input produces a valid packet, planner decision, and candidate module sequence for beta roles.
M4. Practice and evaluator loop v0Connect learning to proof artifacts and rubric-backed feedback.Preview lesson, exercise, submission, evaluator result, and learner events work end to end.
M5. GDPval sandbox and HF proofUse GDPval where it is strongest: realistic tasks, reference files, rubrics, and evaluation scenarios.At least three role/workflow scenarios link to GDPval task IDs or explain why no match exists.
M6. Beta API and product surfaceExpose the loop through API endpoints and a simple beta UI./assess, /curriculum, /workflow/{id}/sandbox, and /submit work with telemetry and error handling.
M7. ICP title coverage and scale pathMake the system explicit about how many ICP titles it can serve today, which need fallback, which need multi-LLM adjudication, and which are outside the wedge.Title coverage map, deterministic service-tier receipt, multi-LLM adjudication report, packet quality audit, source-provenance report, and beta feedback dashboard exist.
06 - Slices

Slices and Tasks

Each slice should be one Linear issue or one small issue range. The right first milestone is not "build everything." It is M1 -> M3: source snapshots, graph contracts, beta packets, and a planner path that can be evaluated.

SliceDemoable behaviorPrimary pathsValidationOwnerRisk
1. Source truth ledgerEvery source has license, owner, version, allowed use, and runtime eligibility.sources.py, artifacts/sources, docs/planningRegistry diff, schema check, license/status report.Data/productMedium
2. Hugging Face ingestion smokeEconomicIndex and GDPval can be downloaded or skipped with explicit reason and cached manifest.ingest.py, artifacts/raw/huggingfaceOffline rerun proves cache; missing files warn clearly.Data engineeringHigh
3. O*NET/BLS canonical loaderOccupational backbone loads into DuckDB/Parquet with stable keys.public_source_snapshots.py, ingest.py, artifacts/warehouseRow counts, key uniqueness, SOC/O*NET join report, BLS access/cached-import status.Data engineeringMedium
4. Job/title intake guardrailsLinkedIn/Kaggle/job-description data is classified as market signal, not canonical truth.raw_linkedin.py, artifacts/rawForbidden columns and licensing checks pass.Data/productHigh
5. Role normalization contractRaw title maps to role family, function, seniority, aliases, and fallback tier.normalize.py, schemas.pyGolden title matrix across beta roles.Product/dataMedium
6. Workflow extraction contractA role produces ranked workflows with task, importance, frequency, and deliverable metadata.packets.py, scoring.pySnapshot tests for 10 beta role profiles.Product/dataMedium
7. AIAffordancePack v1Each task says what AI grants, blocks, requires from humans, and risks.schemas.py, artifacts/derivedRisk tiers and HITL checkpoints present for every top workflow.Curriculum/reviewHigh
8. Beta role wedgeMarketing/growth, paid media, sales, ops-adjacent, and founder roles have packet coverage.artifacts/packets, docs/source_foundationsPacket quality report by role family.ProductMedium
9. CurriculumInputPacket generatorLearnerProfile + RoleProfile + workflows become one planner-ready packet.packets.py, schemas.pyJSON schema validation and fixture snapshots.Data/runtimeMedium
10. Fallback resolverExact role, alias, function, archetype, and LLM fallback are explicit and auditable.runtime.py, normalize.pyNo unknown role silently gets fake precision.Runtime/productHigh
11. Readiness assessment schemaIntake captures role, AI usage, goals, pains, confidence, constraints, and capacity.schemas.py, future APIRequired/optional field tests.ProductMedium
12. Planner scoring v0Planner ranks candidate paths using match, importance, applicability, value, affordability, ease, confidence, and risk penalty.scoring.py, runtime.pyGolden beginner, operator, founder, and retired-professional cases.Product/dataHigh
13. Lesson/exercise/rubric linkerCandidate workflows map to modules, exercises, and evaluator rubrics.artifacts/derived, packets.pySelected module has exercise/rubric or explicit gap flag.CurriculumMedium
14. Learner event ledgerRuntime emits append-only events for assessment, plan, lesson, practice, evaluator, and mastery gates.runtime.py, artifacts/eventsEvent replay reconstructs learner state.RuntimeMedium
15. Evaluator rubric enginePractice submissions produce structured pass/fail and feedback.review.py, future evaluator modulePass, partial, fail, unsafe fixtures.Curriculum/reviewHigh
16. GDPval scenario linkerWorkflows link to GDPval task IDs, prompts, reference files, and rubric JSON where available.HF cache, artifacts/sandboxThree beta scenarios with GDPval provenance or no-match rationale.Data/curriculumHigh
17. Sandbox payload APIA workflow can return setup, prompt, deliverable, tools, HITL checkpoints, and failure modes.Future API, runtime.pyAPI contract tests.Product/runtimeMedium
18. Public API endpoints/assess, /curriculum, /workflow/{id}/sandbox, and /submit exist behind beta constraints.API layer TBDSmoke tests and error-shape tests.Runtime/productHigh
19. Packet quality auditRepresentative role packets are audited for evidence, specificity, risk, fallback, and curriculum coverage.reports.py, artifacts/validationHTML/JSON report with pass/fail thresholds.Data/productMedium
20. Telemetry and observabilityPostHog/Sentry-ready events exist for beta loop visibility.API/UI layer TBDLocal event contract tests; no PII leakage in logs.Ops/productMedium
21. Beta UI pathA learner can complete assessment, view snapshot, preview lesson, submit practice, and see evaluation.Frontend app TBDBrowser smoke and mobile layout checks.Product/designHigh
22. Feedback learning loopEvaluator outcomes and user behavior influence next recommendation.Runtime + eventsReplay tests show changed recommendation after new evidence.Product/runtimeHigh
23. ICP title coverage gateEvery available title row is routed into serve_now, serve_with_fallback, multi_llm_adjudication_required, or exclude_or_not_icp, with reasons and evidence refs.artifacts/semantic_review, artifacts/validation, review.py, future coverage moduleCoverage report shows counts, examples, deterministic reasons, adjudication inputs, and no production unlocks.Product/data + multi-LLM reviewHigh
24. Deployment split runbookHeavy data builds, Cloudflare runtime, HF storage, and local VDS processing have clear ownership.docs/runbooksCold-start agent can reproduce build path from docs.OpsMedium
07 - First execution

First Execution Range

M1 -> M3: real source snapshots, Work Intelligence Graph contracts, beta role packets, and a deterministic CurriculumInputPacket path.

That range was the smallest useful production step because it proved the engine can make personalized decisions from evidence-backed data before product UI complexity arrives. It has now been executed locally on the VDS through public source snapshots, O*NET-backed warehouse ingest, HF ingest, role/workflow packet generation, deterministic planning, GDPval sandbox linking, evaluator fixtures, local API runtime, beta learner shell, static local learner wrapper, content coverage, rubric-depth validation, deployment-readiness proof, telemetry, title-coverage and semantic-review gates, review-dashboard action states, source-authority beta-wedge audit, beta admission policy, and full validation.

Latest hardening range: M2 -> M7 authored rubric depth, authored lesson depth, cached-only BLS official-cache operations, deployment-readiness proof, GDPval hold closeout, GDPval calibration packet, GDPval calibration decision, GDPval replacement closeout, GDPval replacement reviewer payload, GDPval replacement domain review, GDPval replacement practice, GDPval replacement replay bridge, GDPval replacement approved intake, GDPval replacement candidate pack, GDPval replacement candidate backlog, GDPval replacement candidate review batch, GDPval replacement candidate batch review, GDPval replacement candidate batch outcome, GDPval replacement candidate batch resolution, GDPval replacement candidate batch practice, GDPval replacement candidate review, GDPval replacement candidate revision closeout, GDPval replacement candidate practice, and production deployment approval are now executed locally. The simplified replacement rubric is reviewer-approved for local internal synthetic practice only, has passed the deterministic evaluator fixture, now has a five-case review-only synthetic replay/feedback matrix with 0 canonical event-log writes, and is inventoried by an approved-intake gate that creates 0 approvals; the candidate-backlog gate now inventories 64 additional HF-backed candidates for reviewer/domain evidence only with 0 approvals; the candidate-review-batch gate selects 10 of those backlog candidates into backlog_batch_001, leaves 54 remaining, writes a Claude-ready reviewer/domain prompt, and creates 0 approvals; the candidate-batch-review gate records Claude decisions for those 10 as 3 future-gate-only approvals, 5 revision requests, 2 keep-held decisions, 27 required edits, and 0 practice/intake/runtime unlocks; the candidate-batch-outcome gate routes those decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates with 27 open edits and 0 practice/intake/runtime unlocks; the candidate-batch-resolution gate closes 12/27 locally provable edits, admits only 2 fully closed redrafts to local-only practice, preserves 6 source-file-review blockers and 2 held candidates, and creates 0 intake/runtime unlocks; the candidate-batch-practice gate records those 2 practice items with 2/2 deterministic evaluator passes while preserving 15 open edits, 6 source-file-review blockers, 2 held candidates, and 0 approved-intake/runtime/external/public/production unlocks; the candidate-review gate reviewed five additional HF-backed candidates, advanced two only toward local synthetic practice, requested revision on three, recorded six required edits, and created 0 approved-intake/runtime/external/public/production unlocks; the candidate-revision-closeout gate closed all three revisions and 6/6 required edits; the candidate-practice gate now proves all five reviewed/closeout-approved candidates pass deterministic evaluator coverage, replay/feedback preview, and intake inventory with 0 revision holds remaining.

Recommended continuing issues

  1. BLS OEWS official-cache follow-through. The cached-only import path now exists. Keep O*NET 30.3 as the current canonical occupation snapshot and load official BLS wage/employment rows only when oe.data.0.Current is present or another official access path succeeds.
  2. ICP title coverage and service-tier routing. Build the next slice around the title universe. Use artifacts/wedge_report.csv as the broad title-row input and artifacts/semantic_review/semantic_review_all.jsonl as the deterministic evidence floor. The output should classify every available title row into serve_now, serve_with_fallback, multi_llm_adjudication_required, or exclude_or_not_icp, with reasons, source refs, and what AINA can safely do today.
  3. Multi-LLM adjudication and production boundary. Treat deterministic checks as the first pass and multi-LLM review as the escalation path for ambiguous titles, weak-evidence packets, high-risk automation claims, and generic workflow/teaching templates. Preserve all GDPval/HF safety receipts and keep production-deployment-approval green, but do not make a single-reviewer bottleneck the center of the title-coverage milestone.
  4. HF cache and license policy hardening. Preserve capped selected-file ingest, locked revisions, manifests, and runtime eligibility while making broader pulls explicit.
Codex / Claude Code - Implementation kickoff - ICP title coverage
You are implementing the next AINA data-engine milestone in /srv/aina/aina-data-engine-room.

Read:
- docs/planning/aina-personalization-engine-mission-2026-06-09.md
- docs/source_foundations/ainpe-files-shared/AI Native Academy - personalization engine operating system.md
- docs/source_foundations/ainpe-files-shared/Learning Graph June 2026.md
- src/aina_data_engine/sources.py
- src/aina_data_engine/schemas.py
- src/aina_data_engine/packets.py
- src/aina_data_engine/runtime.py

Goal:
Create an ICP title coverage map that routes as many available title rows as possible through deterministic service tiers, then escalates only ambiguous or weak-evidence cases to multi-LLM adjudication.

Source truth:
- artifacts/wedge_report.csv
- artifacts/semantic_review/semantic_review_all.jsonl
- artifacts/semantic_review/semantic_review_summary.json
- artifacts/semantic_review/multi_llm_review.md
- artifacts/validation/full_validation.json
- Hugging Face, O*NET, GDPval, beta admission, and production approval receipts already in artifacts/validation.

Constraints:
- Treat current repo files and validation artifacts as implementation truth.
- Treat the planning docs as draft product truth, not magic authority.
	- Hugging Face datasets are ingested through the bounded VDS path. Preserve byte caps, locked revisions, manifests, and validation checks.
	- Do not download GDPval reference/deliverable folders or other huge payloads by default. Add explicit future flags only after license and cache review.
- Do not invent missing GDPval reference files or silently approve held rubrics. Keep GDPval receipts green as safety constraints, but do not block the title-coverage milestone on single-reviewer ownership.
- Do not call a title deeply production-ready merely because it passes deterministic semantic consistency. Deterministic pass means structurally serviceable; deeper quality moves through multi-LLM adjudication, packet-quality evidence, or future production approval.
- Do not create a single-reviewer gate. If evidence is ambiguous, route to multi_llm_adjudication_required with machine-review inputs and a clear promotion rule.
- Keep production deployment approval green: public runtime, external writes, real-user data, production telemetry, and deployment promotion stay blocked unless explicit approval evidence exists.
- Keep claims about automation, replacement, and economic impact evidence-bound.
- Add tests and validation artifacts.
- Commit work on a branch and leave no orphan state.

Expected output:
	- Preserve the BLS OEWS cached-only import guard: current VDS should stay valid with bls_wage_employment_rows=0 until official rows are cached and schema/row checks pass.
	- New ICP title coverage artifacts: JSON/JSONL/Markdown/HTML reports with total input rows, deduped titles, service-tier counts, representative examples, deterministic reasons, exclusion reasons, source refs, and explicit "who AINA can serve today / with fallback / not yet" language.
	- Tests proving the service-tier classifier is deterministic, does not fake precision, preserves ICP exclusions, and routes ambiguous cases to multi-LLM adjudication instead of auto-approval.
	- Full validation receipt still includes HF ingest, runtime mapping, source-authority audit, beta admission, content coverage, authored lesson depth, rubric depth, deployment readiness, GDPval safety receipts, production deployment approval, beta UI shell, telemetry, semantic review, multi-LLM review, and review-dashboard checks.
Watch out: This prompt is designed to prevent the agent from expanding UI or dataset scope before preserving HF caps, runtime proof, title coverage service tiers, and production-blocked boundaries.
08 - Risks

Risks and Guardrails

Dataset and licensing risk

Some source candidates are canonical public taxonomies, some are research datasets, some are job-market signals, and some may be license-constrained. The engine must never flatten those into one truth layer. Every row used at runtime needs source, version, license/use status, transform lineage, and confidence.

False precision risk

The product will lose trust if it tells a learner "this is your exact workflow" when it only has a fuzzy title match. Fallback tier must be visible inside the packet and used by the planner.

Automation-claim risk

The docs are consistent: do not frame AINA as "AI will replace your job." Anthropic, Microsoft-style exposure, and GDPval-style benchmarks should inform where AI may help, where humans remain accountable, and what evidence is needed. They do not prove replacement.

Prompt-soup risk

If raw source rows are pasted into prompts, the system will be brittle. The correct boundary is structured contracts first, bounded LLM tasks second, deterministic validation always.

Scope risk

The tempting wrong move is to ingest every dataset and build a full UI before the packet boundary is stable. The right wedge is beta roles, traceable sources, deterministic packets, and evaluation evidence.

09 - Acceptance

Acceptance Criteria

10 - Questions

Open Questions

11 - Build path

Plain-English Build Path

  1. A learner answers assessment questions about their role, work, AI usage, goals, pain points, and capacity.
  2. The engine normalizes the role and chooses a confidence-rated fallback path.
  3. The data engine retrieves role/workflow intelligence with source provenance.
  4. The planner builds a CurriculumInputPacket.
  5. The planner chooses a first learning path and explains why.
  6. The learner completes a practice task tied to their work.
  7. The evaluator grades the artifact against a rubric.
  8. Events update the learner profile so the next recommendation is earned by evidence.
Where to start

Continue by building the ICP title coverage map: classify all available title rows into serve_now, serve_with_fallback, multi_llm_adjudication_required, or exclude_or_not_icp, preserve Hugging Face/O*NET/GDPval provenance, and keep all production approval boundaries closed until explicit evidence exists.