AINA Personalization Engine Mission, Milestones, Slices, and Tasks
A source-bound synthesis of the personalization engine documents into a buildable product and data-engine mission.
AINA should be built as a personalized AI capability transformation engine: it turns a learner's real role, workflows, readiness, goals, capacity, constraints, and evidence into a structured learning path, realistic practice, evaluated progression, and a durable record of what changed.
This is not a CSV-to-prompt app and not a generic curriculum generator. The source documents converge on one architecture: a data factory builds a trusted work-intelligence map; the personalization engine uses that map to choose the next best learning move; the evaluator checks proof of capability; learner events improve the route.
Wiki recall: AINA personalization engine
Past attempts. None are canonical. (I checked: Wiki.)
What I found
1. [previous try] AINA Content Library - Wiki - updated 2026-05-23
This page is useful as a reminder that authored curriculum exists on disk, but it is not the same thing as runtime-generated personalization. It warns against confusing inventory with capability: existing lessons can seed the engine, but the engine still needs contracts, eligibility rules, packet generation, and evaluation surfaces.
-> /wiki/topics/aina-content-library.html
2. [previous try] Onboarding & Assessment - Wiki - updated 2026-05-23
This page supports the same learner loop found in the uploaded docs: intake must capture structured role, readiness, current AI use, goals, pain points, and constraints before the system can personalize. What remains useful is the first-mile emphasis: AINA does not begin with content selection; it begins with learner state capture.
-> /wiki/topics/onboarding-and-assessment.html
3. [previous try] AINA Risks & Blockers - Wiki - updated 2026-05-23
This page reinforces the need for evidence-bound claims, data integrity checks, test coverage, and clear risk handling. For this task, the still-relevant warning is that exposure scores, automation claims, and public-source joins must stay provenance-backed and reviewable.
-> /wiki/topics/aina-risks-and-blockers.html
What seems still relevant
The wiki hits line up with the uploaded source docs, but they are advisory. The current repo has progressed since those wiki snapshots: the source foundation docs are mirrored locally, public Hugging Face datasets are registered in the source registry, and the local engine already has schemas, runtime, packet, scoring, provenance, and validation modules. The major context change is that the engine-room now needs to move from "source foundation and local packet factory" toward real dataset ingestion, role/workflow contract hardening, and evaluator-backed runtime proof.
My recommendation for THIS task
Use the uploaded documents as the product source of truth, use the current repo as implementation truth, and turn the synthesis into a milestone plan that starts with data/source contracts before UI. The first production-quality range should prove this vertical path: assessment input -> canonical role/workflow packet -> CurriculumInputPacket -> personalized path -> practice/evaluation event -> quality report.
-> Ready to build, or do you want to redirect?
Source Basis
The source documents agree on the core idea but describe it at different maturity levels. I treated them as founder/product source truth, with this current repo as live implementation truth.
| Source | Used for | Notes |
|---|---|---|
| AI Native Academy - personalization engine operating system.md | Product loop, CurriculumInputPacket, runtime sequence, learner events, MVP wedge | Strongest operating-system document. |
| Learning Graph June 2026.md | AINA learning ladder, work intelligence graph, assessment fields, learner loop | Strongest learning model and graph document. |
| Personalization Engine - 21 May 2025.md | Data/scoring engine framing, dataset roles, build sequence, risk language | Strongest data-engine sequencing document. |
| Personalized AI Education Platform_ The Personalization Engine.md | HF ingestion, API endpoints, HITL matrix, sandbox shape | Strongest implementation prompt document. |
| Architecting a Personalized AI Education Platform_ From Raw Data to Runtime Engine (2).md | DuckDB/HF/GDPval architecture, sandbox/evaluator rationale, deployment split | Strongest raw-data-to-runtime architecture document. |
| AINative Academy - Algorithms, Sauce & Structure - Details.md | Personalization examples and dynamic planner logic | Duplicate of ANA Sauce .md; counted once. |
Hugging Face now runs as bounded VDS engine input, and O*NET 30.3 now backs the local canonical occupation snapshot. Selected Economic Index CSVs, legacy Economic Index wage/employment/task signal files, GDPval train parquet, and O*NET occupation/task files are downloaded or cached, fingerprinted, processed, and mapped into runtime packets.
Bulk GDPval reference/deliverable folders remain intentionally excluded. GDPval file-availability blockers are closed by selecting reference-backed tasks, while the remaining large-rubric source task has a reviewer-ready calibration packet, a recorded keep-held decision with replacement required, a simplified replacement path, a reviewer-banned-term gate with 0 stale terms, a Claude reviewer payload that approves that replacement for local internal synthetic practice with 0 required edits, a domain-review receipt that applies that payload as domain_review_approved_internal_only, a deterministic replacement-practice receipt that passes the approved local fixture, a replacement-replay bridge that keeps a five-case synthetic replay/feedback matrix in a separate review lane with 0 canonical event-log writes, an approved-intake receipt that inventories only the reviewed chain while creating 0 new approvals, a candidate-pack receipt that selects five additional HF-backed candidates for reviewer/domain review while creating 0 approvals, a candidate-backlog receipt that inventories 64 additional HF-backed, reference-backed, deliverable-backed candidates for reviewer/domain evidence only while creating 0 approvals or unlocks, a candidate-review-batch receipt that selects 10 of those candidates into backlog_batch_001, a candidate-batch-review receipt that records Claude decisions as 3 future-gate-only approvals, 5 revision requests, and 2 keep-held decisions with 0 practice/intake/runtime unlocks, a candidate-batch-outcome receipt that routes those decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates while carrying forward 27 open edits and 0 practice/intake/runtime unlocks, a candidate-batch-resolution receipt that closes 12/27 locally provable edits, admits only 2 fully closed redrafts to local-only practice, leaves 3 confirmation candidates and 3 redraft candidates pending source-file review, keeps 2 held candidates out, and creates 0 intake/runtime unlocks, a candidate-batch-practice receipt that runs those 2 closed redrafts with 2/2 deterministic evaluator passes while preserving 6 source-file-review blockers, 2 held candidates, 15 open edits, and 0 approved-intake/runtime unlocks, a candidate-review receipt that approves two only for local practice while requesting revision on three, a candidate-revision-closeout receipt that closes all three revisions and 6/6 required edits, and a candidate-practice receipt that proves all five reviewed/closeout-approved candidates pass deterministic local practice/replay/intake inventory with 0 remaining revision holds. Production approval now keeps public runtime, external writes, real-user data, production telemetry, and deployment promotion blocked unless explicit approval evidence exists. BLS OEWS wage/employment fact import is cached-only: official rows load only when oe.data.0.Current is present or another official access path succeeds, and the live VDS currently records 0 fact rows.
artifacts/wedge_report.csv contains 74,225 title rows. artifacts/semantic_review/semantic_review_summary.json proves 45,564 generated title packets were reviewed, 45,564 passed deterministic semantic consistency, and 0 failed. artifacts/semantic_review/multi_llm_review.md records a Claude CLI adversarial PASS_WITH_NOTES verdict: safe to land locally, while still noting that semantic consistency is not deep packet-quality proof for every role. The next milestone is a title-coverage engine that classifies ICP titles as serve_now, serve_with_fallback, multi_llm_adjudication_required, or exclude_or_not_icp.Evidence from the repo: pyproject.toml includes datasets and huggingface-hub; src/aina_data_engine/sources.py registers Anthropic/EconomicIndex and openai/gdpval; src/aina_data_engine/public_source_snapshots.py parses 1,016 O*NET occupation rows and 18,796 task rows, writes O*NET parquet plus canonical_occupation_snapshot.parquet, supports cached-only BLS OEWS oe.data.0.Current wage/employment fact import, and records the live VDS BLS access/cache gap with 0 wage/employment rows instead of inventing rows; src/aina_data_engine/huggingface_ingest.py downloads the selected byte-capped files and writes derived maps; artifacts/derived/huggingface/hf_ingest_summary.json records 16 processed files including the local manifest, 186,743,670 bytes, 756 Economic Index SOC rows, 821 legacy wage/employment/task signal rows, 220 GDPval tasks, 907 mapped SOC entries, 821 SOC entries with legacy signals, and 44 GDPval-linked SOC groups; src/aina_data_engine/huggingface_mapping.py writes hf_runtime_map_receipt_v1, which verifies 15 selected HF files on disk, checks all 45,564 packets, and proves 43,936 packets with HF refs, 7,906 packets with GDPval refs, 126,588 workflows with HF refs, 821 SOC entries with legacy signals, 4 curriculum modules, 4 exercise links, 4 rubric links, 17 curriculum HF refs, 11 curriculum GDPval refs, 220 sandbox scenarios, and 12 review-dashboard source-backed items; src/aina_data_engine/source_authority.py writes source_authority_beta_wedge_audit_v1, which classifies seven source layers, verifies 15 selected HF files, includes 821 legacy Economic Index signal rows, includes public-source snapshot metrics, audits 11 representative beta families, and keeps HF research/benchmark refs separate from job-market signal provenance; src/aina_data_engine/beta_admission.py writes beta_admission_v1, which approves 5 families for internal synthetic beta, holds 6 as review-required, supports blocked status, discloses the BLS gap, and blocks public release, external writes, and real-user data; src/aina_data_engine/quality.py writes rubric_depth_gate_v1, which checks 48 modules across 12 families for actionable rubric criteria, source refs, locally resolved refs, HF context, GDPval enrichment, and 1 GDPval review hold; src/aina_data_engine/deployment_readiness.py writes deployment_readiness_v1, which routes the remaining held module to calibration and proves no public runtime, no external writes, no real-user data, external beta blocked, and public release blocked; src/aina_data_engine/gdpval_hold_closeout.py writes gdpval_hold_closeout_v1, which proves 9 selected GDPval tasks have reference and deliverable evidence, 0 file-availability blockers remain, and 1 calibration hold remains; src/aina_data_engine/gdpval_calibration_packet.py writes gdpval_calibration_packet_v1, which maps that remaining hold to 15 reference file URIs, 2 deliverable example URIs, 67 raw rubric criteria, 121 points, redacted sample criteria, reviewer actions, and a blocked external/public/unattended boundary; src/aina_data_engine/gdpval_calibration_decision.py writes gdpval_calibration_decision_v1, which records the original task decision as keep-held with replacement required and 0 external/public/unattended unblocks; src/aina_data_engine/gdpval_replacement_closeout.py writes gdpval_replacement_closeout_v1, which records a rewritten simplified replacement traced to processed HF GDPval data and proves 0 reviewer-banned terms; src/aina_data_engine/gdpval_replacement_reviewer_payload.py writes gdpval_replacement_reviewer_payload_v1, which records reviewer payload 3b6a4073d9cfa3b9 as approve_internal with 0 required edits; src/aina_data_engine/gdpval_replacement_domain_review.py writes gdpval_replacement_domain_review_v1, which applies that payload as domain_review_approved_internal_only; src/aina_data_engine/gdpval_replacement_practice.py writes gdpval_replacement_practice_v1, which exercises the approved replacement as local internal synthetic practice and records a deterministic evaluator pass; src/aina_data_engine/gdpval_replacement_replay_bridge.py writes gdpval_replacement_replay_bridge_v1, which converts that practice receipt into 5 separate synthetic replay/feedback-preview cases with 0 canonical event-log writes; src/aina_data_engine/gdpval_replacement_approved_intake.py writes gdpval_replacement_approved_intake_v1, which inventories only the reviewer/domain-approved chain and creates 0 new approvals; src/aina_data_engine/gdpval_replacement_candidate_pack.py writes gdpval_replacement_candidate_pack_v1, which selects 5 additional review-only candidates from 69 eligible HF-backed tasks while creating 0 approvals and 0 unlocks; src/aina_data_engine/gdpval_replacement_candidate_review.py writes gdpval_replacement_candidate_review_v1, which reviews all 5 selected candidates, advances 2 only toward local practice, requests revision on 3, records 6 required edits, and creates 0 approved-intake/runtime unlocks; src/aina_data_engine/gdpval_replacement_candidate_revision_closeout.py writes gdpval_replacement_candidate_revision_closeout_v1, which closes those 3 revisions and 6/6 required edits against the processed HF GDPval task map and downloaded GDPval parquet; src/aina_data_engine/gdpval_replacement_candidate_practice.py writes gdpval_replacement_candidate_practice_v1, which proves all 5 reviewed/closeout-approved candidates pass deterministic local practice, 25 replay/feedback previews, and approved-intake inventory while leaving 0 revision holds and creating 0 runtime unlocks; and src/aina_data_engine/production_deployment_approval.py writes production_deployment_approval_v1, which keeps 5 approval domains blocked with 0 approved unlocks.
src/aina_data_engine/gdpval_replacement_candidate_batch_outcome.py now writes gdpval_replacement_candidate_batch_outcome_v1, which routes the validated backlog_batch_001 decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates while carrying forward 27 open edits, 220 processed GDPval task-map rows, 186,743,670 downloaded HF bytes, and 0 practice/intake/runtime/external/production unlocks.
src/aina_data_engine/gdpval_replacement_candidate_batch_resolution.py now writes gdpval_replacement_candidate_batch_resolution_v1, which resolves that routed batch without pretending source-file contents were inspected: 10 candidates resolved, 12/27 required edits closed, 15 left open, 2 fully closed redrafts admitted to the batch-practice gate, 3 confirmation candidates plus 3 redraft candidates held for source-file review, 2 held candidates kept out of practice, and 0 practice executions inside the resolution gate itself, approved-intake-ready candidates, runtime approvals, external/public/unattended unlocks, or production unlocks.
src/aina_data_engine/gdpval_replacement_candidate_batch_practice.py now writes gdpval_replacement_candidate_batch_practice_v1, which practices only those two fully closed backlog_batch_001 redrafts on top of the processed HF/GDPval evidence: 2 practice candidates, 2 practice items, 2/2 deterministic evaluator passes, 6 source-file-review blockers preserved, 2 held candidates preserved, 15 open edits preserved, 220 processed GDPval task-map rows, 186,743,670 downloaded HF bytes, and 0 approved-intake/runtime/external/production unlocks.
approve_internal reviewer payload, an applied internal-only domain-review receipt, a deterministic practice receipt traced to the processed HF GDPval row and downloaded GDPval parquet, a separate five-case synthetic replay/feedback-preview bridge with 0 canonical learner-history writes, an inventory-only approved-intake receipt with 0 new approvals, a candidate-pack receipt that selects five additional review-only candidates with 0 approvals or unlocks, a candidate-backlog receipt that queues 64 additional review-only candidates with 0 approvals or unlocks, a candidate-review-batch receipt that selects 10 candidates into backlog_batch_001, a candidate-batch-review receipt that records Claude decisions as 3 future-gate-only approvals, 5 revision requests, and 2 keep-held decisions with 0 practice/intake/runtime unlocks, a candidate-batch-outcome receipt that routes those decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates with 27 open edits and 0 practice/intake/runtime unlocks, a candidate-batch-resolution receipt that admits only 2 fully closed redrafts to local-only practice while preserving 6 source-file-review blockers and 2 held candidates, a candidate-batch-practice receipt that records those 2 practice items with 2/2 evaluator passes and 0 approved-intake/runtime unlocks, a candidate-review receipt that approves two only for local practice while requesting revision on three, a candidate-revision-closeout receipt that closes all three revision candidates and 6/6 required edits, and a candidate-practice receipt that keeps all five reviewed/closeout-approved candidates in local-only deterministic practice/replay/intake inventory. Production approval still keeps all five deployment domains blocked.Mission
AINA's mission is to help knowledge workers become AI-native by learning against their actual work, not generic examples.
Build the data and runtime engine that maps a person's role, workflows, readiness, goals, capacity, and proof artifacts into the next best AI capability path, then evaluates whether the person can actually perform the work with AI.
The decision matrix is the product moat:
Role + Workflow + Readiness + Goal + Capacity + Evidence
That matrix should drive every learner-facing decision: what work problem the person is trying to improve, what workflows are recurring, what AI can safely automate or augment, what remains human-owned, what capability level the learner is ready for, what artifact proves progress, and what the engine should do next when evidence is missing.
Non-goals
- Do not build a generic AI course recommender.
- Do not infer job replacement or automation claims without strong evidence.
- Do not regenerate gold curriculum IP casually when curated lesson/rubric assets already exist.
- Do not start by trying to cover every occupation. Start with a beta wedge and a fallback path.
- Do not make Hugging Face PRO, MCP, or a live sandbox the first dependency for the whole product. Use them where they prove value.
Product Reality
The intended learner loop is assessment -> readiness snapshot -> preview lesson -> practice -> evaluation -> upgrade or personalized path.
Learner gives role, industry, goals, current AI use, tool stack, constraints, confidence, and painful workflows.
The engine normalizes the role into family, function, seniority, workflow cluster, and fallback tier.
The planner builds a packet with tasks, AI affordances, risks, candidate modules, exercises, and rubrics.
The learner practices, submits evidence, receives evaluation, and future recommendations adjust.
The intended engine loop is profile hardening -> role mapping -> packet retrieval -> exposure scoring -> candidate selection -> sequencing -> deterministic hydration -> validation -> persistence -> event tracking.
Canonical Contracts
These contracts should become the core product/data boundary. They are intentionally small enough to ship in slices but strong enough to prevent prompt soup.
| Contract | Purpose | Minimum fields |
|---|---|---|
| LearnerProfile | Captures who the learner is and what they need next. | Role title, industry, seniority, goals, AI usage, confidence, constraints, capacity, tool stack, evidence history. |
| RoleProfile | Normalizes a title into an engine-readable role. | Canonical title, role family, function, seniority, SOC/O*NET mapping, aliases, confidence, fallback tier. |
| WorkflowProfile | Describes recurring work the learner actually performs. | Workflow cluster, tasks, pain points, frequency, importance, deliverables, human accountability. |
| AIAffordancePack | States where AI helps and where it should not be trusted. | AI grants, AI blocks, risk level, HITL checkpoints, tool patterns, failure modes. |
| CurriculumInputPacket | The packet the planner consumes. | Learner profile, role profile, top workflows, AI opportunities, boundaries, candidate modules, coverage requirements, risk rules, fallback path, quality flags. |
| PersonalizationDecision | Explains the path chosen by the engine. | Chosen path, candidate scores, excluded options, rationale, confidence, required events. |
| LearnerEvent | Records what happened so the loop can improve. | Event type, actor, learner id, packet id, module id, artifact refs, scores, timestamps. |
| SandboxScenario | Turns real work into practice. | Task id, prompt, reference files, expected deliverable, rubric, allowed tools, HITL checkpoints. |
| EvaluationResult | Turns practice into progression evidence. | Rubric scores, passed gates, failed gates, feedback, next recommendation, evaluator provenance. |
Recommended event vocabulary: assessment_started, assessment_completed, readiness_report_viewed, curriculum_plan_created, module_exposed, lesson_started, practice_submitted, evaluator_completed, mastery_gate_passed, mastery_gate_failed, confidence_updated, goal_updated, tool_stack_updated.
Milestones
| Milestone | Mission | Exit proof |
|---|---|---|
| M0. Mission and source lock | Convert the uploaded docs into one product/data charter and identify current repo gaps. | This artifact exists as markdown + HTML, with source docs listed and HF current status explicit. |
| M1. Canonical source warehouse v1 | Move from source registry to reproducible, validated dataset snapshots. | O*NET/BLS/local docs plus HF EconomicIndex/GDPval snapshots have manifests, licenses, schema checks, and provenance. |
| M2. Work Intelligence Graph v1 | Convert role/task/workflow/source data into reusable packets. | RoleProfile, WorkflowProfile, AIAffordancePack, and fallback mappings exist for beta role families. |
| M3. CurriculumInputPacket and planner v0 | Generate deterministic packets and choose next best learning paths. | Assessment input produces a valid packet, planner decision, and candidate module sequence for beta roles. |
| M4. Practice and evaluator loop v0 | Connect learning to proof artifacts and rubric-backed feedback. | Preview lesson, exercise, submission, evaluator result, and learner events work end to end. |
| M5. GDPval sandbox and HF proof | Use GDPval where it is strongest: realistic tasks, reference files, rubrics, and evaluation scenarios. | At least three role/workflow scenarios link to GDPval task IDs or explain why no match exists. |
| M6. Beta API and product surface | Expose the loop through API endpoints and a simple beta UI. | /assess, /curriculum, /workflow/{id}/sandbox, and /submit work with telemetry and error handling. |
| M7. ICP title coverage and scale path | Make the system explicit about how many ICP titles it can serve today, which need fallback, which need multi-LLM adjudication, and which are outside the wedge. | Title coverage map, deterministic service-tier receipt, multi-LLM adjudication report, packet quality audit, source-provenance report, and beta feedback dashboard exist. |
Slices and Tasks
Each slice should be one Linear issue or one small issue range. The right first milestone is not "build everything." It is M1 -> M3: source snapshots, graph contracts, beta packets, and a planner path that can be evaluated.
| Slice | Demoable behavior | Primary paths | Validation | Owner | Risk |
|---|---|---|---|---|---|
| 1. Source truth ledger | Every source has license, owner, version, allowed use, and runtime eligibility. | sources.py, artifacts/sources, docs/planning | Registry diff, schema check, license/status report. | Data/product | Medium |
| 2. Hugging Face ingestion smoke | EconomicIndex and GDPval can be downloaded or skipped with explicit reason and cached manifest. | ingest.py, artifacts/raw/huggingface | Offline rerun proves cache; missing files warn clearly. | Data engineering | High |
| 3. O*NET/BLS canonical loader | Occupational backbone loads into DuckDB/Parquet with stable keys. | public_source_snapshots.py, ingest.py, artifacts/warehouse | Row counts, key uniqueness, SOC/O*NET join report, BLS access/cached-import status. | Data engineering | Medium |
| 4. Job/title intake guardrails | LinkedIn/Kaggle/job-description data is classified as market signal, not canonical truth. | raw_linkedin.py, artifacts/raw | Forbidden columns and licensing checks pass. | Data/product | High |
| 5. Role normalization contract | Raw title maps to role family, function, seniority, aliases, and fallback tier. | normalize.py, schemas.py | Golden title matrix across beta roles. | Product/data | Medium |
| 6. Workflow extraction contract | A role produces ranked workflows with task, importance, frequency, and deliverable metadata. | packets.py, scoring.py | Snapshot tests for 10 beta role profiles. | Product/data | Medium |
| 7. AIAffordancePack v1 | Each task says what AI grants, blocks, requires from humans, and risks. | schemas.py, artifacts/derived | Risk tiers and HITL checkpoints present for every top workflow. | Curriculum/review | High |
| 8. Beta role wedge | Marketing/growth, paid media, sales, ops-adjacent, and founder roles have packet coverage. | artifacts/packets, docs/source_foundations | Packet quality report by role family. | Product | Medium |
| 9. CurriculumInputPacket generator | LearnerProfile + RoleProfile + workflows become one planner-ready packet. | packets.py, schemas.py | JSON schema validation and fixture snapshots. | Data/runtime | Medium |
| 10. Fallback resolver | Exact role, alias, function, archetype, and LLM fallback are explicit and auditable. | runtime.py, normalize.py | No unknown role silently gets fake precision. | Runtime/product | High |
| 11. Readiness assessment schema | Intake captures role, AI usage, goals, pains, confidence, constraints, and capacity. | schemas.py, future API | Required/optional field tests. | Product | Medium |
| 12. Planner scoring v0 | Planner ranks candidate paths using match, importance, applicability, value, affordability, ease, confidence, and risk penalty. | scoring.py, runtime.py | Golden beginner, operator, founder, and retired-professional cases. | Product/data | High |
| 13. Lesson/exercise/rubric linker | Candidate workflows map to modules, exercises, and evaluator rubrics. | artifacts/derived, packets.py | Selected module has exercise/rubric or explicit gap flag. | Curriculum | Medium |
| 14. Learner event ledger | Runtime emits append-only events for assessment, plan, lesson, practice, evaluator, and mastery gates. | runtime.py, artifacts/events | Event replay reconstructs learner state. | Runtime | Medium |
| 15. Evaluator rubric engine | Practice submissions produce structured pass/fail and feedback. | review.py, future evaluator module | Pass, partial, fail, unsafe fixtures. | Curriculum/review | High |
| 16. GDPval scenario linker | Workflows link to GDPval task IDs, prompts, reference files, and rubric JSON where available. | HF cache, artifacts/sandbox | Three beta scenarios with GDPval provenance or no-match rationale. | Data/curriculum | High |
| 17. Sandbox payload API | A workflow can return setup, prompt, deliverable, tools, HITL checkpoints, and failure modes. | Future API, runtime.py | API contract tests. | Product/runtime | Medium |
| 18. Public API endpoints | /assess, /curriculum, /workflow/{id}/sandbox, and /submit exist behind beta constraints. | API layer TBD | Smoke tests and error-shape tests. | Runtime/product | High |
| 19. Packet quality audit | Representative role packets are audited for evidence, specificity, risk, fallback, and curriculum coverage. | reports.py, artifacts/validation | HTML/JSON report with pass/fail thresholds. | Data/product | Medium |
| 20. Telemetry and observability | PostHog/Sentry-ready events exist for beta loop visibility. | API/UI layer TBD | Local event contract tests; no PII leakage in logs. | Ops/product | Medium |
| 21. Beta UI path | A learner can complete assessment, view snapshot, preview lesson, submit practice, and see evaluation. | Frontend app TBD | Browser smoke and mobile layout checks. | Product/design | High |
| 22. Feedback learning loop | Evaluator outcomes and user behavior influence next recommendation. | Runtime + events | Replay tests show changed recommendation after new evidence. | Product/runtime | High |
| 23. ICP title coverage gate | Every available title row is routed into serve_now, serve_with_fallback, multi_llm_adjudication_required, or exclude_or_not_icp, with reasons and evidence refs. | artifacts/semantic_review, artifacts/validation, review.py, future coverage module | Coverage report shows counts, examples, deterministic reasons, adjudication inputs, and no production unlocks. | Product/data + multi-LLM review | High |
| 24. Deployment split runbook | Heavy data builds, Cloudflare runtime, HF storage, and local VDS processing have clear ownership. | docs/runbooks | Cold-start agent can reproduce build path from docs. | Ops | Medium |
First Execution Range
M1 -> M3: real source snapshots, Work Intelligence Graph contracts, beta role packets, and a deterministic CurriculumInputPacket path.
That range was the smallest useful production step because it proved the engine can make personalized decisions from evidence-backed data before product UI complexity arrives. It has now been executed locally on the VDS through public source snapshots, O*NET-backed warehouse ingest, HF ingest, role/workflow packet generation, deterministic planning, GDPval sandbox linking, evaluator fixtures, local API runtime, beta learner shell, static local learner wrapper, content coverage, rubric-depth validation, deployment-readiness proof, telemetry, title-coverage and semantic-review gates, review-dashboard action states, source-authority beta-wedge audit, beta admission policy, and full validation.
Latest hardening range: M2 -> M7 authored rubric depth, authored lesson depth, cached-only BLS official-cache operations, deployment-readiness proof, GDPval hold closeout, GDPval calibration packet, GDPval calibration decision, GDPval replacement closeout, GDPval replacement reviewer payload, GDPval replacement domain review, GDPval replacement practice, GDPval replacement replay bridge, GDPval replacement approved intake, GDPval replacement candidate pack, GDPval replacement candidate backlog, GDPval replacement candidate review batch, GDPval replacement candidate batch review, GDPval replacement candidate batch outcome, GDPval replacement candidate batch resolution, GDPval replacement candidate batch practice, GDPval replacement candidate review, GDPval replacement candidate revision closeout, GDPval replacement candidate practice, and production deployment approval are now executed locally. The simplified replacement rubric is reviewer-approved for local internal synthetic practice only, has passed the deterministic evaluator fixture, now has a five-case review-only synthetic replay/feedback matrix with 0 canonical event-log writes, and is inventoried by an approved-intake gate that creates 0 approvals; the candidate-backlog gate now inventories 64 additional HF-backed candidates for reviewer/domain evidence only with 0 approvals; the candidate-review-batch gate selects 10 of those backlog candidates into backlog_batch_001, leaves 54 remaining, writes a Claude-ready reviewer/domain prompt, and creates 0 approvals; the candidate-batch-review gate records Claude decisions for those 10 as 3 future-gate-only approvals, 5 revision requests, 2 keep-held decisions, 27 required edits, and 0 practice/intake/runtime unlocks; the candidate-batch-outcome gate routes those decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates with 27 open edits and 0 practice/intake/runtime unlocks; the candidate-batch-resolution gate closes 12/27 locally provable edits, admits only 2 fully closed redrafts to local-only practice, preserves 6 source-file-review blockers and 2 held candidates, and creates 0 intake/runtime unlocks; the candidate-batch-practice gate records those 2 practice items with 2/2 deterministic evaluator passes while preserving 15 open edits, 6 source-file-review blockers, 2 held candidates, and 0 approved-intake/runtime/external/public/production unlocks; the candidate-review gate reviewed five additional HF-backed candidates, advanced two only toward local synthetic practice, requested revision on three, recorded six required edits, and created 0 approved-intake/runtime/external/public/production unlocks; the candidate-revision-closeout gate closed all three revisions and 6/6 required edits; the candidate-practice gate now proves all five reviewed/closeout-approved candidates pass deterministic evaluator coverage, replay/feedback preview, and intake inventory with 0 revision holds remaining.
Recommended continuing issues
- BLS OEWS official-cache follow-through. The cached-only import path now exists. Keep O*NET 30.3 as the current canonical occupation snapshot and load official BLS wage/employment rows only when
oe.data.0.Currentis present or another official access path succeeds. - ICP title coverage and service-tier routing. Build the next slice around the title universe. Use
artifacts/wedge_report.csvas the broad title-row input andartifacts/semantic_review/semantic_review_all.jsonlas the deterministic evidence floor. The output should classify every available title row intoserve_now,serve_with_fallback,multi_llm_adjudication_required, orexclude_or_not_icp, with reasons, source refs, and what AINA can safely do today. - Multi-LLM adjudication and production boundary. Treat deterministic checks as the first pass and multi-LLM review as the escalation path for ambiguous titles, weak-evidence packets, high-risk automation claims, and generic workflow/teaching templates. Preserve all GDPval/HF safety receipts and keep
production-deployment-approvalgreen, but do not make a single-reviewer bottleneck the center of the title-coverage milestone. - HF cache and license policy hardening. Preserve capped selected-file ingest, locked revisions, manifests, and runtime eligibility while making broader pulls explicit.
You are implementing the next AINA data-engine milestone in /srv/aina/aina-data-engine-room. Read: - docs/planning/aina-personalization-engine-mission-2026-06-09.md - docs/source_foundations/ainpe-files-shared/AI Native Academy - personalization engine operating system.md - docs/source_foundations/ainpe-files-shared/Learning Graph June 2026.md - src/aina_data_engine/sources.py - src/aina_data_engine/schemas.py - src/aina_data_engine/packets.py - src/aina_data_engine/runtime.py Goal: Create an ICP title coverage map that routes as many available title rows as possible through deterministic service tiers, then escalates only ambiguous or weak-evidence cases to multi-LLM adjudication. Source truth: - artifacts/wedge_report.csv - artifacts/semantic_review/semantic_review_all.jsonl - artifacts/semantic_review/semantic_review_summary.json - artifacts/semantic_review/multi_llm_review.md - artifacts/validation/full_validation.json - Hugging Face, O*NET, GDPval, beta admission, and production approval receipts already in artifacts/validation. Constraints: - Treat current repo files and validation artifacts as implementation truth. - Treat the planning docs as draft product truth, not magic authority. - Hugging Face datasets are ingested through the bounded VDS path. Preserve byte caps, locked revisions, manifests, and validation checks. - Do not download GDPval reference/deliverable folders or other huge payloads by default. Add explicit future flags only after license and cache review. - Do not invent missing GDPval reference files or silently approve held rubrics. Keep GDPval receipts green as safety constraints, but do not block the title-coverage milestone on single-reviewer ownership. - Do not call a title deeply production-ready merely because it passes deterministic semantic consistency. Deterministic pass means structurally serviceable; deeper quality moves through multi-LLM adjudication, packet-quality evidence, or future production approval. - Do not create a single-reviewer gate. If evidence is ambiguous, route to multi_llm_adjudication_required with machine-review inputs and a clear promotion rule. - Keep production deployment approval green: public runtime, external writes, real-user data, production telemetry, and deployment promotion stay blocked unless explicit approval evidence exists. - Keep claims about automation, replacement, and economic impact evidence-bound. - Add tests and validation artifacts. - Commit work on a branch and leave no orphan state. Expected output: - Preserve the BLS OEWS cached-only import guard: current VDS should stay valid with bls_wage_employment_rows=0 until official rows are cached and schema/row checks pass. - New ICP title coverage artifacts: JSON/JSONL/Markdown/HTML reports with total input rows, deduped titles, service-tier counts, representative examples, deterministic reasons, exclusion reasons, source refs, and explicit "who AINA can serve today / with fallback / not yet" language. - Tests proving the service-tier classifier is deterministic, does not fake precision, preserves ICP exclusions, and routes ambiguous cases to multi-LLM adjudication instead of auto-approval. - Full validation receipt still includes HF ingest, runtime mapping, source-authority audit, beta admission, content coverage, authored lesson depth, rubric depth, deployment readiness, GDPval safety receipts, production deployment approval, beta UI shell, telemetry, semantic review, multi-LLM review, and review-dashboard checks.
Risks and Guardrails
Dataset and licensing risk
Some source candidates are canonical public taxonomies, some are research datasets, some are job-market signals, and some may be license-constrained. The engine must never flatten those into one truth layer. Every row used at runtime needs source, version, license/use status, transform lineage, and confidence.
False precision risk
The product will lose trust if it tells a learner "this is your exact workflow" when it only has a fuzzy title match. Fallback tier must be visible inside the packet and used by the planner.
Automation-claim risk
The docs are consistent: do not frame AINA as "AI will replace your job." Anthropic, Microsoft-style exposure, and GDPval-style benchmarks should inform where AI may help, where humans remain accountable, and what evidence is needed. They do not prove replacement.
Prompt-soup risk
If raw source rows are pasted into prompts, the system will be brittle. The correct boundary is structured contracts first, bounded LLM tasks second, deterministic validation always.
Scope risk
The tempting wrong move is to ingest every dataset and build a full UI before the packet boundary is stable. The right wedge is beta roles, traceable sources, deterministic packets, and evaluation evidence.
Acceptance Criteria
- The mission statement can be read by product, data, curriculum, and engineering without contradiction.
- The current HF status is explicit: bounded VDS ingestion is active, selected files are processed into runtime maps, the HF runtime map receipt proves those maps reach packets/workflows/API/sandbox/beta/review layers, and bulk reference/deliverable downloads remain intentionally excluded.
- The current public-source status is explicit: O*NET 30.3 backs the local canonical occupation snapshot, while BLS OEWS wage/employment fact import is cached-only and currently records 0 fact rows on the VDS because official BLS access/cache is absent.
- The current ICP title-coverage status is explicit: 74,225 wedge-report title rows exist, 45,564 generated title packets pass deterministic semantic consistency with 0 failures, the current multi-LLM review verdict is
PASS_WITH_NOTES, and the next acceptance artifact must classify titles intoserve_now,serve_with_fallback,multi_llm_adjudication_required, andexclude_or_not_icp. - The current authored-rubric and lesson-depth status is explicit:
authored-lesson-depthvalidates 48/48 reviewed modules with resolved authored nodes, role fit, level fit, signal density, pedagogy signal, workflow/AI-affordance grounding, practice/rubric links, source-path quality, and 0 gaps;rubric-depth-gatevalidates 48/48 reviewed modules with actionable criteria, resolved source refs, and HF context;gdpval-hold-closeoutproves 9 selected tasks with reference/deliverable evidence, 0 file-availability blockers, and 1 remaining calibration hold;gdpval-calibration-packetpackages that hold without auto-approval;gdpval-calibration-decisionrecords keep-held with replacement required and no external/public/unattended unblocks. - The current deployment-readiness and approval status is explicit: local internal synthetic beta is ready with review holds, the remaining GDPval held module is routed to a reviewer-ready calibration packet and recorded as keep-held with replacement required, missing-reference/file-availability holds are closed, the simplified replacement path is approved only for local internal synthetic practice, production approval requires explicit evidence, public runtime is off, external writes are off, real-user data is off, production telemetry is off, deployment promotion is off, and external beta/public release stay blocked.
- The current replacement-practice replay and intake status is explicit: the approved local substitute can be translated into five separate synthetic replay and feedback-preview cases, approved intake inventories only evidence-approved items, and the lane produces 0 new approvals, 0 canonical learner-history writes, and 0 progression unlocks until explicit production approval evidence exists.
- The current replacement-candidate status is explicit: the candidate pack selects five additional HF-backed, reference-backed, deliverable-backed GDPval task candidates for reviewer/domain review; the candidate backlog inventories the remaining 64 eligible HF-backed candidates for reviewer/domain evidence only; the candidate-review-batch gate selects 10 backlog candidates into
backlog_batch_001with 54 remaining and creates only a reviewer/domain prompt; the candidate-batch-review gate records Claude decisions on that batch as 3 future-gate-only approvals, 5 revision requests, 2 keep-held decisions, and 27 required edits with 0 practice/intake/runtime unlocks; the candidate-batch-outcome gate routes those same decisions into 3 confirmation-gate candidates, 5 revision-redraft candidates, and 2 held-out candidates while carrying forward 27 open edits and 0 practice/intake/runtime unlocks; the candidate-batch-resolution gate closes only locally provable edits, admits 2 fully closed redrafts to local-only practice, leaves 3 confirmation candidates and 3 redraft candidates pending source-file review, keeps 2 held candidates out, and creates 0 intake/runtime unlocks; the candidate-batch-practice gate executes exactly those 2 closed redrafts with 2/2 deterministic evaluator passes, preserves 6 source-file-review blockers, 2 held candidates, 15 open edits, and 0 approved-intake/runtime unlocks; the candidate-review gate approves two only for local practice and requests revision on three; the candidate-revision-closeout gate closes all three revision candidates and 6/6 required edits; the candidate-practice gate proves all five reviewed/closeout-approved candidates pass deterministic local practice, replay/feedback preview, and intake inventory; and the whole lane creates 0 approved-intake runtime unlocks and 0 external/public/unattended/production unlocks. - Milestones move from source truth to graph contracts to planner to evaluator to API/UI.
- Slices are vertical and demoable, not vague horizontal layers.
- The first execution range is small enough for one agent lane to start without another strategy meeting.
- Every automation/economic-impact claim is evidence-bound or flagged as future work.
Open Questions
- Which beta wedge should be first: marketing/growth or founder/operator?
- What is the minimum acceptable learner assessment for the first beta: 10 questions, 20 questions, or adaptive intake?
- Which sources are allowed for production claims versus internal recommendation hints?
- Should GDPval be used only for advanced labs, or can simplified GDPval-inspired tasks appear earlier?
- Where will the first runtime API live: this repo, `aina-platform`, or a new service boundary?
- What model panel and promotion rule should multi-LLM adjudication use for ambiguous or high-risk title packets?
- What threshold moves a title from serve_with_fallback to serve_now?
Plain-English Build Path
- A learner answers assessment questions about their role, work, AI usage, goals, pain points, and capacity.
- The engine normalizes the role and chooses a confidence-rated fallback path.
- The data engine retrieves role/workflow intelligence with source provenance.
- The planner builds a CurriculumInputPacket.
- The planner chooses a first learning path and explains why.
- The learner completes a practice task tied to their work.
- The evaluator grades the artifact against a rubric.
- Events update the learner profile so the next recommendation is earned by evidence.
Continue by building the ICP title coverage map: classify all available title rows into serve_now, serve_with_fallback, multi_llm_adjudication_required, or exclude_or_not_icp, preserve Hugging Face/O*NET/GDPval provenance, and keep all production approval boundaries closed until explicit evidence exists.