Workflow Intelligence 500 Embedding Checkpoint
A clean next-family proof, plus an explicit stop on title-only jobs_research_role rows.
After completing workflow_seed, Codex tested the next lane. jobs_research_role failed semantic QA because its repaired rows were still store-number/title artifacts, so Codex did not embed them. workflow_intelligence passed semantic QA and now has a clean 500-vector Gemini proof.
Source-family decision
jobs_research_role had 0 / 50 semantic QA pass. Sample text was title-only: 00299 store manager style rows with no workflow substance.workflow_intelligence had 50 / 50 semantic QA pass. Repaired text includes workflow name, risk domains, usage policy, and repo-relative provenance.Workflow intelligence result
| Item | Result |
|---|---|
| Source family | workflow_intelligence |
| Total family rows in eligibility ledger | 3,152 |
| Repaired chunks materialized | 500 |
| Repair action | placeholder_noise_cleanup |
| Semantic QA sample | 50 / 50 pass |
| Dry-run candidates | 500 |
| Live embedded new vectors | 500 |
| Failed Gemini rows | 0 |
| Total Gemini vectors | 14,432 |
| Workflow intelligence vectors | 500 |
| Workflow vectors overall | 7,465 |
| Stale vectors | 0 |
Validation state
| Gate or receipt | Result |
|---|---|
| Workflow intelligence semantic QA | 50 / 50 pass |
ain-506-p0-gate | pass |
ain-510-retrieval-promotion-gate | promotion_ready |
production-chunk-vector-reconciliation | pass |
source-authority-registry-v2 | pass |
artifact-exposure-scan | 0 active findings |
validate | pass |
| Counter | Value |
|---|---|
| Combined chunk authority | 330,385 |
| Vector rows | 14,432 |
| Matched vectors | 14,432 |
| Unvectorized chunks | 315,953 |
Product boundary
Runtime embedding authority remains off. Public runtime, real-user data, external writes, production telemetry, donor repo mutation, raw market dumps, malformed rows, and learner-linked/private payloads remain blocked.
Commands run
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-eligibility --source-family jobs_research_role uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family jobs_research_role --limit 500 --shard-size 250 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family jobs_research_role --include-repaired --limit 50 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-eligibility --source-family workflow_intelligence uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family workflow_intelligence --limit 500 --shard-size 250 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family workflow_intelligence --include-repaired --limit 50 uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family workflow_intelligence --include-repaired --dry-run --max-new 500 --selection-mode progressive uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family workflow_intelligence --include-repaired --max-new 500 --selection-mode progressive --allow-live-gemini --confirm-paid-api --workers 8 --timeout-seconds 60 --write-every 250 uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness uv run aina-data-engine --root /srv/aina/aina-data-engine-room artifact-exposure-scan uv run aina-data-engine --root /srv/aina/aina-data-engine-room docs-frontmatter-check uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
What remains
The next logical step is to scale workflow_intelligence from 500 to family-complete. The eligibility ledger has 3,152 rows, so the next dry run should likely show 2,652 remaining candidates if the repaired corpus is expanded cleanly.
cd /srv/aina/aina-data-engine-room git status --short --branch uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-repaired-corpus --source-family workflow_intelligence --limit 5000 --shard-size 1000 uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family workflow_intelligence --include-repaired --limit 50 uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family workflow_intelligence --include-repaired --dry-run --max-new 5000 --selection-mode progressive
Continue with workflow_intelligence family completion. Keep jobs_research_role blocked until it is repaired with richer context.