Local handoff - AINA Data Engine Room - 2026-06-15

M2 Workflow Embedding Checkpoint

A small clean-before-embed run completed the jobs-research workflow vector family.

Ali Mehdi Mukadam - co-authored with Codex - 2026-06-15

The Single Idea

M2 now has one more clean source-authoritative family embedded: the remaining jobs_research_workflow repaired chunks. This was a small progressive live run, not a batch run and not a broad title expansion.

01

What changed

The workflow family moved through the clean-before-embed ladder: repaired corpus confirmation, deterministic semantic QA, dry-run candidate selection, small live Vertex ADC embedding, AIN-510, vector reconciliation, source-authority registry refresh, exposure scan, runtime readiness, AIN-506, and full validation.

  1. Confirmed the repaired corpus exists for jobs_research_workflow.
  2. Ran deterministic semantic QA over 50 workflow rows.
  3. Dry-ran Gemini candidate selection with repaired chunks included.
  4. Embedded the 56 missing workflow vectors.
  5. Refreshed the receipts that make the vector snapshot authoritative locally.
02

Live embedding scope

89workflow rows
56new vectors
0failed rows
7,066accepted vectors
ItemResult
Source familyjobs_research_workflow
Existing vectors before run33
jobs_research_workflow vector count after run89
Workflow-related vector count after run99
Stale vectors after AIN-5100
Known-pair cosine gap0.190463

The live call used Gemini Embedding 2 at 768 dimensions through Vertex ADC on project aina-495702. No Developer API key route was used.

03

Semantic QA proof

jobs_research_workflow semantic QA passed with 50 sampled rows, 50 passes, no failures, no raw JD key hits, and no legacy review gate hits. The adjacent jobs_research_tool plus workflow_tool_evidence family also passed QA; the dry run found no new candidates because all 83 chunks already had accepted vectors.

04

Locked boundaries

The run did not change runtime or release posture. Public runtime, real-user data, external writes, production telemetry, runtime embedding authority, batch promotion, and donor repo mutation all remain off.

05

Verification

uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family jobs_research_tool --source-family workflow_tool_evidence --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-embedding-semantic-qa --source-family jobs_research_workflow --limit 50
uv run aina-data-engine --root /srv/aina/aina-data-engine-room gemini-embedding-run --source-family jobs_research_workflow --include-repaired --max-new 100 --selection-mode progressive --workers 4 --timeout-seconds 60 --max-retries 4 --write-every 25 --allow-live-gemini --confirm-paid-api
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
uv run aina-data-engine --root /srv/aina/aina-data-engine-room source-authority-registry-v2
uv run aina-data-engine --root /srv/aina/aina-data-engine-room docs-frontmatter-check
uv run aina-data-engine --root /srv/aina/aina-data-engine-room artifact-exposure-scan
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-runtime-readiness
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-506-p0-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate

All passed after rerunning vector reconciliation serially. A first reconciliation attempt was run in parallel with AIN-510 and read the old vector count; the serialized rerun is the durable receipt.

06

Next work

Continue M2 with another bounded source family. Recommended order: workflow_seed / workflow_intelligence, then jobs_research_responsibility, and only then serviceable_title after stronger source-authority repair proof.

cd /srv/aina/aina-data-engine-room
git status --short --branch
git log -3 --oneline
uv run aina-data-engine --root /srv/aina/aina-data-engine-room validate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room ain-510-retrieval-promotion-gate
uv run aina-data-engine --root /srv/aina/aina-data-engine-room production-chunk-vector-reconciliation
Where to start

Do not return to broad title embedding until the source-family gate proves it will not re-embed stale labels or posting artifacts.