AINA Data Engine And Academy Alignment Board
A read-first map so future agents advance the same product spine instead of rediscovering old repos.
AINA's production path is aina-data-engine-room -> versioned platform-safe exports -> aina-academy -> live learner experience. aina-platform remains the live/front-door and design-pattern donor until Ali decides the convergence path. aina-core is useful, but it is a read-only donor/consolidation snapshot rather than a new center of gravity.
Each repo becomes its own truth: titles get cleaned in isolation, embeddings run against noisy labels, and platform integration keeps waiting for one more data pass.
The data engine produces trusted exports; Academy consumes them through explicit contracts; Platform remains the live/design donor until the cutover decision.
Current Repo Roles
| Repo | Role now | What belongs there | What must not happen |
|---|---|---|---|
/srv/aina/aina-data-engine-room | Current data/build authority | Source authority, cleaned corpora, title/role/context spine, AI Fluency Capability Map contracts, embeddings, exact-cosine retrieval proof, local runtime contracts, release-boundary receipts | Do not make public runtime or real-user claims from local proof alone |
/srv/repos/aina-academy | Future product/runtime platform | Cloudflare Worker, D1/Drizzle schema, Zod contracts, learner loop, Practice Arena, tutor/evaluator, recommendations, payments test mode, admin, staging | Do not query VDS-local DuckDB/Python live; consume versioned exports only |
/srv/repos/aina-platform | Current live surface and donor | Existing live site/front door, design language, auth/payment/runtime patterns, production UX/copy references | Do not fork product truth away from Academy without a convergence decision |
/srv/repos/aina-core | Reference/archive/experiment donor | Stitch ledger, title expansion evidence, serving probes, evidence enrichment patterns, verified missing assets | Do not build new canonical work here by default |
| Donor repos and archives | Read-only quarries | Cleaned titles, jobs-research, evidence atlas, HF signals, ALIPE vision/context, Fusion task shape | Do not mutate donors, embed raw junk, or treat labels as truth |
Product Mission
The product mission is learner/role -> AI Fluency Capability Map -> simulation/practice -> evaluation -> proof -> next curriculum move.
From AI anxiety to AI fluency: assessment, simulation, personalized curriculum, and proof.
Title coverage, embeddings, BLS/O*NET/HF data, jobs-research assets, JD-aware role context, tool maps, evaluator rubrics, and source authority are all inputs. The product object is a learner-facing AI Fluency path that can be trusted, explained, evaluated, and improved.
Operating Goal
The active goal is reconciliation-gated: preserve the current state, fix stale metadata, finish the Fusion/core import ledger, define the export manifest, prove Academy-safe consumption, and only then resume clean data expansion.
Autonomously preserve and reconcile the AINA Data Engine Room production spine before new data expansion: create verified backup/tag proof for current head 11bf3c0, refresh the private GitHub backup if needed, fix stale board/metadata and mark embeddings parked; then complete the Fusion/core import-decision ledger, manually port only accepted diffs with gates green, consolidate local main into one branch truth, define engine_room_export_manifest_v1, and prove one platform-safe top-500/top-1,000 export can be consumed by aina-academy without live VDS/DuckDB/Python coupling, while preserving source authority, AI Fluency capability-map boundaries, exact-cosine retrieval proof, and keeping public runtime, real-user data, external writes, production telemetry, runtime embedding authority, donor mutation, and deletion blocked until explicit release receipts exist.
Current Execution Truth
| Surface | State |
|---|---|
| Active branch | codex/aina-prod-readiness-2026-06-14 |
| Execution base head | 11bf3c0 Add data engine academy alignment board before this board-refresh checkpoint commit. |
| Main status | Current branch is 26 commits ahead of local main; main is not yet the final truth. |
| Remote | No git remote is configured in this checkout. |
| Pre-edit preservation | Archive tag archive/2026-06-15/prod-spine-board-refresh-11bf3c0; verified bundle /srv/aina/checkpoints/aina-data-engine-room/2026-06-15-production-spine-board-refresh/aina-data-engine-room-11bf3c0-prod-spine-board-refresh.bundle; SHA256 df9677cbf79ecb68730168a085c029b4c209c3f578124cea44216d2b0759f35b. |
| Fusion branches | 24 unmerged branch labels require import-or-decline decisions before local main consolidation. |
| Embeddings | Parked until reconciliation and engine_room_export_manifest_v1 identify product-consumed source families. |
| Release boundary | Public runtime, real-user data, external writes, production telemetry, runtime embedding authority, donor mutation, and deletion remain blocked. |
Source Truth Rules
| Rule | Meaning |
|---|---|
| Current receipts win | Current repo receipts beat old handoffs and donor labels. |
| Data engine owns source intelligence | aina-data-engine-room is the data authority unless a newer founder decision says otherwise. |
| Academy owns learner runtime | Academy owns learner state, auth, D1, lessons, practice, evaluator routing, payments, and UI. |
| Core is not center of gravity | aina-core is reference/archive/experiment material by default. |
| Labels are metadata | Good text with doubtful labels can be preserved only with label authority downgraded. |
| Release receipts unlock production | Public runtime, real-user data, external writes, telemetry, donor deletion, and runtime embedding authority require explicit receipts. |
| Embeddings wait for export proof | Live Gemini work resumes only after reconciliation/export gates name clean, product-consumed source families. |
Milestones, Slices, And Tasks
engine_room_export_manifest_v1, platform-safe payloads.The first implementation move is M0/M1: preservation proof and Fusion/core import-decision ledger. The next product move is M2/M3: define engine_room_export_manifest_v1 and prove Academy can consume a pinned top-500/top-1,000 export without live coupling.
Embedding Policy In The New Map
Gemini embeddings remain important, but they are parked until M1-M3 prove what Academy actually consumes. After that, use gemini-embedding-2 at 768 dimensions through paid Vertex ADC on project aina-495702. Exact cosine stays source-of-truth retrieval until VSS/RuVector acceleration has parity proof.
| Allowed | Blocked |
|---|---|
| Clean, source-authoritative semantic chunks; repaired text with doubtful labels metadata-only; progressive foreground embedding; batch after proof | Raw market/posting dumps, raw learner artifacts, bad labels in embedding text, malformed CSV rows, quarantined rows, batch with unresolved repair queues |
No-Write Zones And Release Boundaries
Donor repos, raw source files, real learner data, secrets, env files, credentials, billing settings, and production Cloudflare/telemetry/payment writes remain no-write zones unless explicitly scoped. Data engine builds, embedding receipts, retrieval proof, headless runtime fixtures, and Academy import dry runs remain local-only by default.
Validation Stack
cd /srv/aina/aina-data-engine-room git status --short --branch uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-start-here uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness uv run aina-data-engine --root /srv/aina/aina-data-engine-room platform-live-boundary uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
cd /srv/repos/aina-academy pnpm typecheck pnpm test bash ops/smoke/core-loop.sh
Done Means For This Mission
The next cold agent can pick up without guessing. The repo roles are recorded, milestones are recorded, older boards point to this alignment, the active goal matches the reconciliation-gated goal, and next execution starts with Fusion/core reconciliation plus the Academy export contract instead of isolated embedding or title cleanup.
Next Best Action
Finish the Fusion/import decision ledger, manually port accepted diffs with gates green, consolidate local main, define engine_room_export_manifest_v1, build a top-500/top-1,000 platform-safe export, and prove Academy can consume it locally.