AINA Data Engine Room · Personalization Engine · 2026-06-15

Workflow Intelligence 500 Embedding Checkpoint

A clean next-family proof, plus an explicit stop on title-only jobs_research_role rows.

Ali Mehdi Mukadam · co-authored with Codex · 2026-06-15

The Single Idea

After completing workflow_seed, Codex tested the next lane. jobs_research_role failed semantic QA because its repaired rows were still store-number/title artifacts, so Codex did not embed them. workflow_intelligence passed semantic QA and now has a clean 500-vector Gemini proof.

01 · Decision

Source-family decision

Blockedjobs_research_role had 0 / 50 semantic QA pass. Sample text was title-only: 00299 store manager style rows with no workflow substance.
Proceedworkflow_intelligence had 50 / 50 semantic QA pass. Repaired text includes workflow name, risk domains, usage policy, and repo-relative provenance.
02 · Live Run

Workflow intelligence result

ItemResult
Source familyworkflow_intelligence
Total family rows in eligibility ledger3,152
Repaired chunks materialized500
Repair actionplaceholder_noise_cleanup
Semantic QA sample50 / 50 pass
Dry-run candidates500
Live embedded new vectors500
Failed Gemini rows0
Total Gemini vectors14,432
Workflow intelligence vectors500
Workflow vectors overall7,465
Stale vectors0
03 · Gates

Validation state

Gate or receiptResult
Workflow intelligence semantic QA50 / 50 pass
ain-506-p0-gatepass
ain-510-retrieval-promotion-gatepromotion_ready
production-chunk-vector-reconciliationpass
source-authority-registry-v2pass
artifact-exposure-scan0 active findings
validatepass
CounterValue
Combined chunk authority330,385
Vector rows14,432
Matched vectors14,432
Unvectorized chunks315,953
04 · Boundary

Product boundary

Runtime embedding authority remains off. Public runtime, real-user data, external writes, production telemetry, donor repo mutation, raw market dumps, malformed rows, and learner-linked/private payloads remain blocked.

The jobs_research_role failure is a useful guardrail: do not embed title-only repaired rows just because the schema can carry them.
05 · Verification

Commands run

uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-eligibility --source-family jobs_research_role
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family jobs_research_role --limit 500 --shard-size 250
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family jobs_research_role --include-repaired --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-eligibility --source-family workflow_intelligence
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family workflow_intelligence --limit 500 --shard-size 250
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family workflow_intelligence --include-repaired --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family workflow_intelligence --include-repaired --dry-run --max-new 500 --selection-mode progressive
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family workflow_intelligence --include-repaired --max-new 500 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 8 --timeout-seconds 60 --write-every 250
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness
uv run aina-data-engine --root /srv/aina/aina-data-engine-room artifact-exposure-scan
uv run aina-data-engine --root /srv/aina/aina-data-engine-room docs-frontmatter-check
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
06 · Resume

What remains

The next logical step is to scale workflow_intelligence from 500 to family-complete. The eligibility ledger has 3,152 rows, so the next dry run should likely show 2,652 remaining candidates if the repaired corpus is expanded cleanly.

cd /srv/aina/aina-data-engine-room
git status --short --branch
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family workflow_intelligence --limit 5000 --shard-size 1000
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family workflow_intelligence --include-repaired --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family workflow_intelligence --include-repaired --dry-run --max-new 5000 --selection-mode progressive
Where to start

Continue with workflow_intelligence family completion. Keep jobs_research_role blocked until it is repaired with richer context.