Skip to content

web/api: add fuzzy metadata search endpoints#18293

Draft
roidelapluie wants to merge 2 commits into
prometheus:mainfrom
roidelapluie:roidelapluie/new_api_labels_values_discovery
Draft

web/api: add fuzzy metadata search endpoints#18293
roidelapluie wants to merge 2 commits into
prometheus:mainfrom
roidelapluie:roidelapluie/new_api_labels_values_discovery

Conversation

@roidelapluie
Copy link
Copy Markdown
Member

@roidelapluie roidelapluie commented Mar 13, 2026

Add search endpoints for metric names, label names, and label values, backed by new storage search interfaces and TSDB implementations.

The search API supports substring, subsequence, and Jaro-Winkler-based matching, optional relevance sorting, and NDJSON streaming responses. This also wires the new endpoints into the query UI and adds backend and utility tests.

Known limitation: frequency/cardinality enrichment currently relies on per-result query-time scans, so those paths may need follow-up work before they scale well.

prometheus/proposals#74

Which issue(s) does the PR fix:

Does this PR introduce a user-facing change?

[FEATURE] API: Add new search endpoints for metric names, label names, and label values.

@roidelapluie roidelapluie marked this pull request as draft March 13, 2026 17:02
Add search endpoints for metric names, label names, and label values, backed by new storage search interfaces and TSDB implementations.

The search API supports substring, subsequence, and Jaro-Winkler-based matching, optional relevance sorting, and NDJSON streaming responses. This also wires the new endpoints into the query UI and adds backend and utility tests.

Known limitation: frequency/cardinality enrichment currently relies on per-result query-time scans, so those paths may need follow-up work before they scale well.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
@roidelapluie roidelapluie force-pushed the roidelapluie/new_api_labels_values_discovery branch from 216e79a to 9ae9c1b Compare March 13, 2026 17:15
Comment thread docs/querying/api.md

- `match[]=<series_selector>`: Repeated series selector used to scope the
search. Optional.
- `search=<string>`: Search string matched against names or values. Optional.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest version of the proposal the search allows for multiple search strings to be set. They are treated as an OR conditional.

We could support the request allowing an AND or OR operator to control how the multiple search values are handled.

prometheus/proposals#74

Comment thread docs/querying/api.md
- `fuzz_threshold=<number>`: Fuzzy threshold from 0 to 100. Optional.
- `fuzz_alg=<subsequence | jarowinkler>`: Matching algorithm. Optional.
- `case_sensitive=<bool>`: Toggle case-sensitive matching. Optional.
- `sort_by=<string>`: Sort mode. Supported values depend on the endpoint.
Copy link
Copy Markdown
Contributor

@tcp13equals2 tcp13equals2 Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the proposal we have sort_by as optional and if it is not set the default is no sort order. Might be worth highlighting this that if no sort is set then no sort is applied. This is different to the existing endpoints which always return in alphabetical order.

Comment thread docs/querying/api.md
- `sort_by=<string>`: Sort mode. Supported values depend on the endpoint.
- `sort_dir=<asc | dsc>`: Sort direction. Optional. Only valid when `sort_by`
is set.
- `start=<rfc3339 | unix_timestamp>`: Start timestamp. Optional.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed having a sane default start/end time if none is set.

Comment thread docs/querying/api.md
`application/x-ndjson`. Each response contains one or more result batches,
followed by a final trailer line:

```json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example with a warning annotation could be useful.

Comment thread storage/interface.go Outdated
type Searcher interface {
// SearchLabelNames returns label names matching the search criteria.
// Results include relevance scores based on the Filter.
SearchLabelNames(ctx context.Context, hints *SearchHints, matchers ...*labels.Matcher) ([]SearchResult, error)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the iterator style return type?

This then unlocks more options for fully streaming results.

type SearcherValueSet interface {
	Next() bool
	At() SearchResult
	Warnings() annotations.Annotations
	Err() error
	Close()
}```

Comment thread docs/querying/api.md

- `match[]=<series_selector>`: Repeated series selector used to scope the
search. Optional.
- `search=<string>`: Search string matched against names or values. Optional.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the proposal to make this param search[] so it better matches with match[]

Comment thread docs/querying/api.md
- `match[]=<series_selector>`: Repeated series selector used to scope the
search. Optional.
- `search=<string>`: Search string matched against names or values. Optional.
- `fuzz_threshold=<number>`: Fuzzy threshold from 0 to 100. Optional.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note default is 0 (no fuzz matching)

Comment thread docs/querying/api.md
search. Optional.
- `search=<string>`: Search string matched against names or values. Optional.
- `fuzz_threshold=<number>`: Fuzzy threshold from 0 to 100. Optional.
- `fuzz_alg=<subsequence | jarowinkler>`: Matching algorithm. Optional.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note default is jarowinkler

Comment thread docs/querying/api.md

Additional parameters for `/api/v1/search/metric_names`:

- `include_cardinality=<bool>`: Include metric cardinality in each result.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the proposal to remove cardinality and frequency. I have left the include_metadata for now.

Comment thread docs/querying/api.md

- `include_cardinality=<bool>`: Include metric cardinality in each result.
- `include_metadata=<bool>`: Include metric metadata in each result.
- `sort_by=<alpha | cardinality | score>`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With score might be worth noting that score only makes sense to sort by if there is a search[] + fuzz_threshold set

Comment thread storage/generic.go Outdated

// mergeSearchResults merges search results from multiple calls to fn, deduplicating
// by value and taking the maximum score for duplicates.
func mergeSearchResults(hints *SearchHints, fn func(Searcher) ([]SearchResult, error), searchers []Searcher) ([]SearchResult, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per comments in the interfaces - can we have the Searcher return a ValueSet style iterator?

Comment thread storage/generic.go
for value, score := range scores {
merged = append(merged, SearchResult{Value: value, Score: score})
}
if hints != nil && hints.CompareFunc != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that we support the no-sort case. If no sort_by is set then we should not be applied any sort. ie do not default to alpha

Comment thread storage/interface.go
// Returns (accepted, score) where score is used for relevance ranking.
// Score should be in range [0.0, 1.0] where 1.0 is perfect match.
type Filter interface {
Accept(value string) (accepted bool, score float64)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One messy flaw we still have with this interface is this ... consider a search where;

  • search[] = cpu
  • fuzz_threshold = not set - so no fuzz applied and no score calculated in the Filter
  • sort_by = score

This is a legitimate use case - auto complete on labels/values containing cpu

We either need to detect this early and require the Filter to still calculate the score or lazy calculate the score before the comparator is applied. If we do the former it's a different filter chain then applying a search + fuzz filter.

Comment thread storage/interface.go Outdated
// CompareFunc is used for ordering results.
// It receives full SearchResult values, allowing comparison by value,
// score, or any combination.
// A nil value means alphabetical ordering by value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, I think a nil CompareFunc should imply no sorting is applied.

Comment thread tsdb/querier.go
labelHints.Limit = hints.Limit
}

names, err := q.index.LabelNames(ctx, matchers...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to implement the Searcher interface inside the q.index? Get the filtering as low possible and then return the ValueSet style iterator which better facilitates streaming the results back?

Comment thread tsdb/querier.go
// Apply filter and collect scores.
var results []storage.SearchResult
for _, name := range names {
if hints.Filter != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per comment above - we need to consider the use case where a search[] is set, fuzz_threshold is not set and sort_by=score is selected.


// Winkler modification: boost for common prefix up to 4 characters.
prefixLen := 0
maxPrefix := min(4, l1, l2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this prefix boosting value need to be pulled out into a configurable option?

Address review comments on PR 18293:

- Switch Searcher interface to iterator-based SearchResultSet return type
  for streaming; add sliceSearchResultSet, EmptySearchResultSet, and
  ErrSearchResultSet helpers
- Rename search= to search[] and support multiple values with OR logic
  via orSearchesFilter and buildSearchFilter
- Change nil CompareFunc to mean no sorting (natural index order)
- Default fuzz_alg to jarowinkler, fuzz_threshold=0 disables fuzzy matching
- Remove cardinality and frequency enrichment from all three endpoints
- Default start/end to a 1-hour lookback window when not specified
- Add include_score parameter to return relevance scores per result
- Enforce sort_by=score requires search[] to be set
- Update OpenAPI schemas and golden files accordingly
- Replace local design doc with upstream proposal 74 content

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants