Semantic search over attestations using sentence transformers (all-MiniLM-L6-v2) via ONNX Runtime.
ats/embeddings/src/): ONNX Runtime 2.0 inference, HuggingFace tokenizer, mean pooling, L2 normalization → 384-dim unit vectorsats/embeddings/embeddings/): CGO bindings to Rust, model lifecycle, FLOAT32_BLOB serializationats/storage/embedding_store.go): sqlite-vec L2 distance search, DELETE+INSERT for virtual table compatibilityserver/embeddings_handlers.go): conditional compilation via rustembeddings build tag (now default in make cli)024_create_embeddings_table.sql — embeddings table + vec_embeddings virtual tableEmbeddings are configured via am.toml or the UI config API:
[embeddings]
enabled = true
path = "ats/embeddings/models/all-MiniLM-L6-v2/model.onnx"
name = "all-MiniLM-L6-v2"
cluster_threshold = 0.5
recluster_interval_seconds = 3600 # re-cluster every hour (0 = disabled)
min_cluster_size = 5
| Key | Type | Default | Description |
|---|---|---|---|
embeddings.enabled | bool | false | Enable the embedding service on startup |
embeddings.path | string | ats/embeddings/models/all-MiniLM-L6-v2/model.onnx | Path to ONNX model file |
embeddings.name | string | all-MiniLM-L6-v2 | Model identifier for metadata |
embeddings.cluster_threshold | float | 0.5 | Minimum cosine similarity for incremental cluster assignment |
embeddings.recluster_interval_seconds | int | 0 | Pulse schedule interval for HDBSCAN re-clustering (0 = disabled) |
embeddings.min_cluster_size | int | 5 | Minimum cluster size for HDBSCAN |
When enabled = false (default), SetupEmbeddingService skips initialization even if built with the rustembeddings tag. Enabling requires the model file to exist at the configured path.
When recluster_interval_seconds > 0, a Pulse scheduled job is auto-created on startup to periodically re-run HDBSCAN clustering. The schedule is idempotent — restarting the server won't duplicate it. Changing the interval in config updates the existing schedule.
An LLM labels unlabeled clusters by sampling member attestation texts. Labels are attested by qntx@embeddings — the label, evidence (sampled texts), and model become part of the attestation graph.
[embeddings]
cluster_label_interval_seconds = 300 # label every 5 min (0 = disabled)
cluster_label_min_size = 15 # skip small clusters
cluster_label_sample_size = 5 # random texts sent to LLM
cluster_label_max_per_cycle = 3 # don't label everything at once
cluster_label_cooldown_days = 7 # min days between re-labels per cluster
cluster_label_max_tokens = 2000 # LLM response budget
cluster_label_model = "" # empty = system default provider/model
| Key | Type | Default | Description |
|---|---|---|---|
embeddings.cluster_label_interval_seconds | int | 0 | Pulse schedule interval (0 = disabled) |
embeddings.cluster_label_min_size | int | 15 | Minimum members for a cluster to be eligible |
embeddings.cluster_label_sample_size | int | 5 | Random texts sampled from the cluster and sent to the LLM |
embeddings.cluster_label_max_per_cycle | int | 3 | Max clusters labeled per scheduled run |
embeddings.cluster_label_cooldown_days | int | 7 | Minimum days before re-labeling a cluster |
embeddings.cluster_label_max_tokens | int | 2000 | LLM max_tokens for the labeling request |
embeddings.cluster_label_model | string | "" | Model override (empty = system default from OpenRouter/local config) |
Re-labeling is purely time-gated (per-cluster cooldown). Membership-change-ratio triggers are deferred until more usage data informs the heuristic.
Config can also be updated at runtime via the REST API:
PATCH /api/config
{"updates": {"embeddings.enabled": true, "embeddings.path": "/path/to/model.onnx"}}
| Method | Path | Description |
|---|---|---|
| GET | /api/search/semantic?q=<text>&limit=10&threshold=0.7 | Search stored embeddings by semantic similarity |
| POST | /api/embeddings/generate | Generate embedding for {"text": "..."} — returns 384-dim vector |
| POST | /api/embeddings/batch | Embed attestations by ID: {"attestation_ids": ["..."]} |
| POST | /api/embeddings/cluster | Run HDBSCAN clustering: {"min_cluster_size": 5} |
| POST | /api/embeddings/project | Run UMAP projection via reduce plugin, store 2D coords |
| GET | /api/embeddings/projections | Get [{id, source_id, x, y, cluster_id}] for visualization |
| GET | /api/embeddings/info | Embedding service status, counts, and cluster summary |
Without the rustembeddings build tag, all endpoints return 503.
Embeddings are 384-dimensional — too high to visualize directly. The qntx-reduce plugin projects them to 2D via UMAP for canvas visualization.
See qntx-reduce/README.md for setup and API details.
Flow: POST /api/embeddings/project reads all embeddings, calls the reduce plugin's /fit endpoint, and writes projection_x/projection_y back to the embeddings table. New attestations are auto-projected via /transform if the model is fitted.
Located at ats/embeddings/models/all-MiniLM-L6-v2/ (not in git). See ats/embeddings/README.md for download instructions.
EmbeddingObserver embeds attestations with rich text on creation (#482)rich_string_fields from type definitions (#479)qntx@embeddingsEmbedding tests (ats/embeddings/embeddings/embeddings_test.go) require the ONNX model files (~80MB) and add ~3s of inference per run. They're gated behind //go:build cgo && rustembeddings — CI doesn't pass this tag, so they only run locally.
This avoids burdening every PR with model download/caching and inference time. If the embedding surface area grows, a dedicated ci-embeddings.yml workflow (triggered only on changes to ats/embeddings/, ats/storage/embedding_store*, server/embeddings_handlers*) can be added without affecting the main pipeline.
db/connection.go imports sqlite-vec CGO bindings unconditionally — every Go build pays the CGO compilation cost, even builds that don't use embeddings. This is coupled to migration 024, which creates a vec0 virtual table that requires the extension to be loaded. The migration runs unconditionally via //go:embed sqlite/migrations/*.sql.
Making this conditional requires solving both sides together:
sqlite_vec import behind a build tagembeddings table stays universal, vec_embeddings virtual table becomes conditional)Current choice: accept the universal CGO dependency. The cli-nocgo target (CGO_ENABLED=0) will fail on migration 024 at runtime if it encounters a database that hasn't run that migration yet.