Embeddings

Semantic search over attestations using sentence transformers (all-MiniLM-L6-v2) via ONNX Runtime.

Architecture

Configuration

Embeddings are configured via am.toml or the UI config API:

[embeddings]
enabled = true
path = "ats/embeddings/models/all-MiniLM-L6-v2/model.onnx"
name = "all-MiniLM-L6-v2"
cluster_threshold = 0.5
recluster_interval_seconds = 3600  # re-cluster every hour (0 = disabled)
min_cluster_size = 5
KeyTypeDefaultDescription
embeddings.enabledboolfalseEnable the embedding service on startup
embeddings.pathstringats/embeddings/models/all-MiniLM-L6-v2/model.onnxPath to ONNX model file
embeddings.namestringall-MiniLM-L6-v2Model identifier for metadata
embeddings.cluster_thresholdfloat0.5Minimum cosine similarity for incremental cluster assignment
embeddings.recluster_interval_secondsint0Pulse schedule interval for HDBSCAN re-clustering (0 = disabled)
embeddings.min_cluster_sizeint5Minimum cluster size for HDBSCAN

When enabled = false (default), SetupEmbeddingService skips initialization even if built with the rustembeddings tag. Enabling requires the model file to exist at the configured path.

When recluster_interval_seconds > 0, a Pulse scheduled job is auto-created on startup to periodically re-run HDBSCAN clustering. The schedule is idempotent — restarting the server won't duplicate it. Changing the interval in config updates the existing schedule.

Cluster Labeling

An LLM labels unlabeled clusters by sampling member attestation texts. Labels are attested by qntx@embeddings — the label, evidence (sampled texts), and model become part of the attestation graph.

[embeddings]
cluster_label_interval_seconds = 300  # label every 5 min (0 = disabled)
cluster_label_min_size = 15           # skip small clusters
cluster_label_sample_size = 5         # random texts sent to LLM
cluster_label_max_per_cycle = 3       # don't label everything at once
cluster_label_cooldown_days = 7       # min days between re-labels per cluster
cluster_label_max_tokens = 2000       # LLM response budget
cluster_label_model = ""              # empty = system default provider/model
KeyTypeDefaultDescription
embeddings.cluster_label_interval_secondsint0Pulse schedule interval (0 = disabled)
embeddings.cluster_label_min_sizeint15Minimum members for a cluster to be eligible
embeddings.cluster_label_sample_sizeint5Random texts sampled from the cluster and sent to the LLM
embeddings.cluster_label_max_per_cycleint3Max clusters labeled per scheduled run
embeddings.cluster_label_cooldown_daysint7Minimum days before re-labeling a cluster
embeddings.cluster_label_max_tokensint2000LLM max_tokens for the labeling request
embeddings.cluster_label_modelstring""Model override (empty = system default from OpenRouter/local config)

Re-labeling is purely time-gated (per-cluster cooldown). Membership-change-ratio triggers are deferred until more usage data informs the heuristic.

Config can also be updated at runtime via the REST API:

PATCH /api/config
{"updates": {"embeddings.enabled": true, "embeddings.path": "/path/to/model.onnx"}}

API Endpoints

MethodPathDescription
GET/api/search/semantic?q=<text>&limit=10&threshold=0.7Search stored embeddings by semantic similarity
POST/api/embeddings/generateGenerate embedding for {"text": "..."} — returns 384-dim vector
POST/api/embeddings/batchEmbed attestations by ID: {"attestation_ids": ["..."]}
POST/api/embeddings/clusterRun HDBSCAN clustering: {"min_cluster_size": 5}
POST/api/embeddings/projectRun UMAP projection via reduce plugin, store 2D coords
GET/api/embeddings/projectionsGet [{id, source_id, x, y, cluster_id}] for visualization
GET/api/embeddings/infoEmbedding service status, counts, and cluster summary

Without the rustembeddings build tag, all endpoints return 503.

2D Projection (UMAP)

Embeddings are 384-dimensional — too high to visualize directly. The qntx-reduce plugin projects them to 2D via UMAP for canvas visualization.

See qntx-reduce/README.md for setup and API details.

Flow: POST /api/embeddings/project reads all embeddings, calls the reduce plugin's /fit endpoint, and writes projection_x/projection_y back to the embeddings table. New attestations are auto-projected via /transform if the model is fitted.

Model Files

Located at ats/embeddings/models/all-MiniLM-L6-v2/ (not in git). See ats/embeddings/README.md for download instructions.

Completed

Open Work

Open Questions

Design decision: embedding tests are local-only

Embedding tests (ats/embeddings/embeddings/embeddings_test.go) require the ONNX model files (~80MB) and add ~3s of inference per run. They're gated behind //go:build cgo && rustembeddings — CI doesn't pass this tag, so they only run locally.

This avoids burdening every PR with model download/caching and inference time. If the embedding surface area grows, a dedicated ci-embeddings.yml workflow (triggered only on changes to ats/embeddings/, ats/storage/embedding_store*, server/embeddings_handlers*) can be added without affecting the main pipeline.

Technical Debt

Design decision: unconditional sqlite-vec

db/connection.go imports sqlite-vec CGO bindings unconditionally — every Go build pays the CGO compilation cost, even builds that don't use embeddings. This is coupled to migration 024, which creates a vec0 virtual table that requires the extension to be loaded. The migration runs unconditionally via //go:embed sqlite/migrations/*.sql.

Making this conditional requires solving both sides together:

Current choice: accept the universal CGO dependency. The cli-nocgo target (CGO_ENABLED=0) will fail on migration 024 at runtime if it encounters a database that hasn't run that migration yet.