vector search over Notes — query instances by relevance to natural language

## Description

`Diffo.Provider.Note` already carries `text` — free-form annotation that parties write against instances. The next step is making that text *searchable by meaning*, not just by exact match or substring. Given a natural-language query, return the instances whose notes are most relevant, ranked by similarity, with the matching note excerpts surfaced as data.

Two layers:

**Note-level vector search** — embed each note's text on create/update; vector index the embeddings; a read action that takes a query text, embeds it, and returns notes ranked by similarity.

**Instance-level relevance** — aggregate note relevance up to the annotated instance. An instance's relevance score is some combination (max? sum? weighted) of the relevance of its notes (and possibly notes on instances related to it). Query instances by their relevance to a natural-language question.

## Why it matters

Free-text notes are the most under-leveraged surface in any operational system. They're written casually by parties (engineers, support agents, installers, customers) and contain the lived experience of the domain — the things the schema doesn't say. Today they're searchable by `contains/2` if you're lucky.

With vector search, the choreography gets a new alignment dimension: text-relevance. A customer reporting "intermittent dropouts after rain" can land on instances whose notes describe similar patterns, even when the wording is different. An engineer designing a new service can find prior services with similar constraints expressed naturally. A repairer can find prior repairs that resemble the symptom.

This also makes the system honest about what it knows — the rich free-text knowledge that parties carry becomes queryable as a first-class part of the graph, not a separate text-search silo. And it fits the choreography model: each instance carries its own annotations, search emerges from text relevance across the graph, no central index outside the framework.

## What we'd find useful

```elixir
# note-level
{:ok, notes} =
  Diffo.Provider.Note
  |> Ash.Query.for_read(:find_similar, %{query: "intermittent packet loss after rain", limit: 20})
  |> Ash.read()

# instance-level — instances ranked by note relevance
{:ok, instances} =
  DslAccess
  |> Ash.Query.for_read(:find_by_note_relevance, %{
    query: "intermittent packet loss after rain",
    limit: 10
  })
  |> Ash.read()

# combined with other filters
{:ok, instances} =
  DslAccess
  |> Ash.Query.filter(places_contains_geo: ^customer_lat_lng_within(:km, 5))
  |> Ash.Query.for_read(:find_by_note_relevance, %{query: "..."})
  |> Ash.read()
```

The relevance scores should be available alongside the records — collapsible to a ranked list of IDs (per [#42](https://github.com/diffo-dev/diffo_example/issues/42)), or kept rich (instances + scores + matching note excerpts) when the consumer wants the why.

## A possible direction

Several pieces, each likely a separate yarn upstream:

1. **Embedding-on-write hook** on `Diffo.Provider.Note` — an `embedding` attribute populated by a configurable embed function on `:create` and `:update`. Local model (bumblebee/ollama) or hosted (OpenAI/Cohere) — consumer's choice via config.
2. **Vector index in AshNeo4j** — Neo4j 5+ has native vector indexes via `db.index.vector.queryNodes`. AshNeo4j needs to expose this either as an index declaration on the resource or as a query primitive that produces a similarity-ordered result.
3. **Instance-level aggregate** — a calculation or aggregate on Instance that scores it against a query by aggregating the relevance of its (and related?) notes. Probably a new Diffo extension or just an Ash calculation pattern.
4. **A query primitive** — `find_by_note_relevance(query, opts)` action shape that takes the natural-language query and returns ranked records. Could be a Diffo-provided action that any BaseInstance resource gets.

The exemplar work would be to wire one embedding model end-to-end on Note, prove the Neo4j vector index round-trips through AshNeo4j, and prototype an instance-relevance ranking. The deltas become yarns for AshNeo4j (vector index + query expressions) and Diffo (the embedding hook and the instance-aggregate pattern).

Related:
- [#3](https://github.com/diffo-dev/diffo_example/issues/3), [#41](https://github.com/diffo-dev/diffo_example/issues/41), [#42](https://github.com/diffo-dev/diffo_example/issues/42) — the answer-shape family handles "found instances + scores" naturally; collapse to ranked-IDs for the simple case.
- [#38](https://github.com/diffo-dev/diffo_example/issues/38), [#39](https://github.com/diffo-dev/diffo_example/issues/39) — feasibility + note-relevance compose: "which servable shelves have notes suggesting they handle weather conditions like the customer's reported issue."
- [#40](https://github.com/diffo-dev/diffo_example/issues/40) — note-relevance is a metric of value (a quality score, a fit score) that the preference machinery can weigh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vector search over Notes — query instances by relevance to natural language #43

Description

Why it matters

What we'd find useful

A possible direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

vector search over Notes — query instances by relevance to natural language #43

Description

Description

Why it matters

What we'd find useful

A possible direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions