Description
Diffo.Provider.Note already carries text — free-form annotation that parties write against instances. The next step is making that text searchable by meaning, not just by exact match or substring. Given a natural-language query, return the instances whose notes are most relevant, ranked by similarity, with the matching note excerpts surfaced as data.
Two layers:
Note-level vector search — embed each note's text on create/update; vector index the embeddings; a read action that takes a query text, embeds it, and returns notes ranked by similarity.
Instance-level relevance — aggregate note relevance up to the annotated instance. An instance's relevance score is some combination (max? sum? weighted) of the relevance of its notes (and possibly notes on instances related to it). Query instances by their relevance to a natural-language question.
Why it matters
Free-text notes are the most under-leveraged surface in any operational system. They're written casually by parties (engineers, support agents, installers, customers) and contain the lived experience of the domain — the things the schema doesn't say. Today they're searchable by contains/2 if you're lucky.
With vector search, the choreography gets a new alignment dimension: text-relevance. A customer reporting "intermittent dropouts after rain" can land on instances whose notes describe similar patterns, even when the wording is different. An engineer designing a new service can find prior services with similar constraints expressed naturally. A repairer can find prior repairs that resemble the symptom.
This also makes the system honest about what it knows — the rich free-text knowledge that parties carry becomes queryable as a first-class part of the graph, not a separate text-search silo. And it fits the choreography model: each instance carries its own annotations, search emerges from text relevance across the graph, no central index outside the framework.
What we'd find useful
# note-level
{:ok, notes} =
Diffo.Provider.Note
|> Ash.Query.for_read(:find_similar, %{query: "intermittent packet loss after rain", limit: 20})
|> Ash.read()
# instance-level — instances ranked by note relevance
{:ok, instances} =
DslAccess
|> Ash.Query.for_read(:find_by_note_relevance, %{
query: "intermittent packet loss after rain",
limit: 10
})
|> Ash.read()
# combined with other filters
{:ok, instances} =
DslAccess
|> Ash.Query.filter(places_contains_geo: ^customer_lat_lng_within(:km, 5))
|> Ash.Query.for_read(:find_by_note_relevance, %{query: "..."})
|> Ash.read()
The relevance scores should be available alongside the records — collapsible to a ranked list of IDs (per #42), or kept rich (instances + scores + matching note excerpts) when the consumer wants the why.
A possible direction
Several pieces, each likely a separate yarn upstream:
- Embedding-on-write hook on
Diffo.Provider.Note — an embedding attribute populated by a configurable embed function on :create and :update. Local model (bumblebee/ollama) or hosted (OpenAI/Cohere) — consumer's choice via config.
- Vector index in AshNeo4j — Neo4j 5+ has native vector indexes via
db.index.vector.queryNodes. AshNeo4j needs to expose this either as an index declaration on the resource or as a query primitive that produces a similarity-ordered result.
- Instance-level aggregate — a calculation or aggregate on Instance that scores it against a query by aggregating the relevance of its (and related?) notes. Probably a new Diffo extension or just an Ash calculation pattern.
- A query primitive —
find_by_note_relevance(query, opts) action shape that takes the natural-language query and returns ranked records. Could be a Diffo-provided action that any BaseInstance resource gets.
The exemplar work would be to wire one embedding model end-to-end on Note, prove the Neo4j vector index round-trips through AshNeo4j, and prototype an instance-relevance ranking. The deltas become yarns for AshNeo4j (vector index + query expressions) and Diffo (the embedding hook and the instance-aggregate pattern).
Related:
- #3, #41, #42 — the answer-shape family handles "found instances + scores" naturally; collapse to ranked-IDs for the simple case.
- #38, #39 — feasibility + note-relevance compose: "which servable shelves have notes suggesting they handle weather conditions like the customer's reported issue."
- #40 — note-relevance is a metric of value (a quality score, a fit score) that the preference machinery can weigh.
Description
Diffo.Provider.Notealready carriestext— free-form annotation that parties write against instances. The next step is making that text searchable by meaning, not just by exact match or substring. Given a natural-language query, return the instances whose notes are most relevant, ranked by similarity, with the matching note excerpts surfaced as data.Two layers:
Note-level vector search — embed each note's text on create/update; vector index the embeddings; a read action that takes a query text, embeds it, and returns notes ranked by similarity.
Instance-level relevance — aggregate note relevance up to the annotated instance. An instance's relevance score is some combination (max? sum? weighted) of the relevance of its notes (and possibly notes on instances related to it). Query instances by their relevance to a natural-language question.
Why it matters
Free-text notes are the most under-leveraged surface in any operational system. They're written casually by parties (engineers, support agents, installers, customers) and contain the lived experience of the domain — the things the schema doesn't say. Today they're searchable by
contains/2if you're lucky.With vector search, the choreography gets a new alignment dimension: text-relevance. A customer reporting "intermittent dropouts after rain" can land on instances whose notes describe similar patterns, even when the wording is different. An engineer designing a new service can find prior services with similar constraints expressed naturally. A repairer can find prior repairs that resemble the symptom.
This also makes the system honest about what it knows — the rich free-text knowledge that parties carry becomes queryable as a first-class part of the graph, not a separate text-search silo. And it fits the choreography model: each instance carries its own annotations, search emerges from text relevance across the graph, no central index outside the framework.
What we'd find useful
The relevance scores should be available alongside the records — collapsible to a ranked list of IDs (per #42), or kept rich (instances + scores + matching note excerpts) when the consumer wants the why.
A possible direction
Several pieces, each likely a separate yarn upstream:
Diffo.Provider.Note— anembeddingattribute populated by a configurable embed function on:createand:update. Local model (bumblebee/ollama) or hosted (OpenAI/Cohere) — consumer's choice via config.db.index.vector.queryNodes. AshNeo4j needs to expose this either as an index declaration on the resource or as a query primitive that produces a similarity-ordered result.find_by_note_relevance(query, opts)action shape that takes the natural-language query and returns ranked records. Could be a Diffo-provided action that any BaseInstance resource gets.The exemplar work would be to wire one embedding model end-to-end on Note, prove the Neo4j vector index round-trips through AshNeo4j, and prototype an instance-relevance ranking. The deltas become yarns for AshNeo4j (vector index + query expressions) and Diffo (the embedding hook and the instance-aggregate pattern).
Related: