Skip to content

docs(router): metrics#7696

Merged
dotansimha merged 15 commits intomainfrom
kamil-router-metrics
Mar 12, 2026
Merged

docs(router): metrics#7696
dotansimha merged 15 commits intomainfrom
kamil-router-metrics

Conversation

@kamilkisiela
Copy link
Copy Markdown
Contributor

@kamilkisiela kamilkisiela commented Feb 13, 2026

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @kamilkisiela, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation for OpenTelemetry metrics in the Hive Router by introducing dedicated UI components for presenting metric and label information clearly. It provides users with a comprehensive guide on how to configure, export, and interpret the various metrics available, improving observability and troubleshooting capabilities for router operations.

Highlights

  • New UI Components for Metrics Documentation: Introduced LabelCard, MetricCard, and MetricsSection React components to provide a structured and interactive way to display OpenTelemetry metrics and their associated labels within the documentation.
  • Comprehensive OpenTelemetry Metrics Documentation: Added a new documentation page (metrics.mdx) detailing the OpenTelemetry metrics exposed by the Hive Router, covering configuration for OTLP and Prometheus, instrumentation customization, and a detailed reference for various metric categories.
  • Detailed Metric and Label References: The new documentation includes extensive references for GraphQL, Supergraph, HTTP server, HTTP client, and Cache metrics, along with explanations of their labels, typical values, and monitoring recommendations for production environments.
Changelog
  • packages/web/docs/src/components/otel-metrics/label-card.tsx
    • Added a new React component to display individual metric labels with their meaning, typical values, and notes.
  • packages/web/docs/src/components/otel-metrics/metric-card.tsx
    • Added a new React component to display individual metrics, including their name, type, unit, description, and associated labels, with a copy-to-link feature.
  • packages/web/docs/src/components/otel-metrics/metrics-section.tsx
    • Added a new React component to group and render multiple MetricCard and LabelCard components, providing an expandable section for labels.
  • packages/web/docs/src/content/router/observability/metrics.mdx
    • Added a new documentation page detailing OpenTelemetry metrics for the Hive Router, covering configuration, instrumentation, and a comprehensive reference of GraphQL, Supergraph, HTTP server, HTTP client, and Cache metrics.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 13, 2026

🚀 Snapshot Release (alpha)

The latest changes of this PR are available as alpha on npm (based on the declared changesets):

Package Version Info
@graphql-hive/cli 0.58.1-alpha-20260216151046-a7dc6de7cd0da593a8b7038d6ad153b0e21220a6 npm ↗︎ unpkg ↗︎
hive 9.4.1-alpha-20260216151046-a7dc6de7cd0da593a8b7038d6ad153b0e21220a6 npm ↗︎ unpkg ↗︎

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 13, 2026

🐋 This PR was built and pushed to the following Docker images:

Targets: build

Platforms: linux/amd64

Image Tag: 8c1a0e2c40e935d62c4ea5055a205b2547843901

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation for OpenTelemetry metrics, complete with new React components for a structured and interactive presentation. The implementation is solid, and the new documentation page is well-organized. I have a few suggestions to enhance semantic correctness, accessibility, and component reusability, primarily concerning the use of an <a> tag for a copy-link action and refactoring the MetricsSection component for better flexibility.

Comment thread packages/web/docs/src/components/otel-metrics/metric-card.tsx
Comment thread packages/web/docs/src/components/otel-metrics/metrics-section.tsx
Comment thread packages/web/docs/src/components/otel-metrics/metrics-section.tsx
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 13, 2026

💻 Website Preview

The latest changes are available as preview in: https://pr-7696.hive-landing-page.pages.dev

@kamilkisiela kamilkisiela added the waiting-on:router-release Do not merge: waiting for Router release that includes this feature. label Feb 17, 2026
@kamilkisiela kamilkisiela marked this pull request as ready for review February 17, 2026 10:53
kamilkisiela added a commit to graphql-hive/router that referenced this pull request Mar 11, 2026
This PR introduces a full metrics pipeline based on OpenTelemetry. It
adds support for emitting histograms, counters and gauges for HTTP and
GraphQL operations, exports them through OTLP or Prometheus, and allows
fine‑grained control over metrics.

- [Documentation PR](graphql-hive/console#7696)
- [Documentation
preview](https://pr-7696.hive-landing-page.pages.dev/docs/router/observability/metrics)

**New dependencies**
The router now depends on `opentelemetry‑prometheus` and `prometheus`
crates, and `humantime` for duration parsing.

**Performance considerations**
Observed metrics are collected asynchronously and should not block
request processing.

**`Metrics` struct**
aggregates metrics for five domains - HTTP server, HTTP client, GraphQL
pipeline, the supergraph loader and the internal caches. Each
sub‑component exposes counters and histograms and can be disabled via
configuration.

**Meter provider setup**
The Telemetry struct now holds separate `traces_provider`,
`metrics_provider` and an optional `PrometheusRuntime`. During
initialisation the router builds an OpenTelemetry meter provider based
on `telemetry.metrics.*` configuration and sets it as global via
`opentelemetry::global::set_meter_provider`. If a Prometheus exporter is
configured, a `PrometheusRuntime` is created, either attached to the
existing HTTP server (reusing the router port) or detached on a
dedicated port.

**Prometheus runtime**
The PrometheusRuntime enum encapsulates two modes:
- `Attached` - metrics are served from the same HTTP server on a
configurable path (default `/metrics`).
- `Detached` - if a distinct port is configured, the runtime spawns a
dedicated `ntex::HttpServer` with its own lifecycle. I limited the
server to a **single worker**.

**Endpoint conflict detection**
Because metrics may live under `/metrics` or another path, the router
now encapsulates paths into a `RouterPaths` struct. On start‑up it
verifies that GraphQL, health, readiness and Prometheus endpoints do not
collide and returns a `RouterInitError::EndpointConflict` if they do.

**Request context storage**
A new `pipeline::request_extensions` module defines small structs stored
in `HttpRequest` extensions. These hold the raw body size, the GraphQL
operation name and type, and the final response status. It also exposes
helper functions that are used by different pipeline stages to read and
write data.


## Configuration

Metrics are disabled by default. They are enabled when at least one
exporter is present and marked as enabled.
Each exporter entry has a `kind` field (e.g. `otlp` or `prometheus`) and
optional settings such as `interval`, `temporality` and `protocol`. A
Prometheus exporter can specify a `path` and `port` ("empty" means reuse
the router port).

The `telemetry.metrics.instrumentation` section allows tuning histogram
aggregation.

A common histogram configuration defines explicit buckets for `bytes`
(request/response body sizes) and `seconds` (durations). The default
buckets are covered in the docs. We do two buckets, one per unit, as
their values and their ranges are completely different.

A per-instrument override lets you disable individual metrics or strip
certain attributes.


## Noteworthy changes

- Caches are now stored in `CacheState`. This centralisation makes it
easier to manage cache invalidation on supergraph reloads and to
register metrics observers.
- Cache reads and writes are now standardised. I used
`.entry().or_try_insert_with()` API of moka everywhere, to guarantee
that concurrent calls on the same not-existing entry are coalesced into
one evaluation. We used to do `get().await` and then `insert().await` in
some cases, but now we have minimum cache writes and maximum reusability
of just inserted entries.
- I also created `EntryResultHitMissExt` trait that extends Moka's Entry
API, to give us insights into cache hits, misses and errors. Useful for
metrics and spans attempting to record cache hits/misses.

## Example config

```yaml
telemetry:
  metrics:
    exporters:
      - kind: prometheus       # enable the Prometheus exporter
        enabled: true
        path: /metrics         # override the default path
        port: 6969             # spins up a dedicated http server, running
      - kind: otlp
        endpoint: http://otel‑collector:4318
        protocol: http    # or grpc
        interval: 30s     # push every 30s
        temporality: cumulative
    instrumentation:
      common:
        histogram:
          aggregation: explicit # or exponential
          bytes:
            buckets: [128, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 3145728, 4194304, 5242880]
            record_min_max: false
          seconds:
            buckets: [0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10]
            record_min_max: false
      instruments:
        http_server_request_duration_seconds:
          attributes:
            graphql.operation.name: false    # drop operation name label, as it has high cardinality
```
@dotansimha dotansimha merged commit a3acbd8 into main Mar 12, 2026
26 checks passed
@dotansimha dotansimha deleted the kamil-router-metrics branch March 12, 2026 09:32
n1ru4l pushed a commit that referenced this pull request Apr 10, 2026
dotansimha pushed a commit to graphql-hive/router that referenced this pull request Apr 13, 2026
This PR introduces a full metrics pipeline based on OpenTelemetry. It
adds support for emitting histograms, counters and gauges for HTTP and
GraphQL operations, exports them through OTLP or Prometheus, and allows
fine‑grained control over metrics.

- [Documentation PR](graphql-hive/console#7696)
- [Documentation
preview](https://pr-7696.hive-landing-page.pages.dev/docs/router/observability/metrics)

**New dependencies**
The router now depends on `opentelemetry‑prometheus` and `prometheus`
crates, and `humantime` for duration parsing.

**Performance considerations**
Observed metrics are collected asynchronously and should not block
request processing.

**`Metrics` struct**
aggregates metrics for five domains - HTTP server, HTTP client, GraphQL
pipeline, the supergraph loader and the internal caches. Each
sub‑component exposes counters and histograms and can be disabled via
configuration.

**Meter provider setup**
The Telemetry struct now holds separate `traces_provider`,
`metrics_provider` and an optional `PrometheusRuntime`. During
initialisation the router builds an OpenTelemetry meter provider based
on `telemetry.metrics.*` configuration and sets it as global via
`opentelemetry::global::set_meter_provider`. If a Prometheus exporter is
configured, a `PrometheusRuntime` is created, either attached to the
existing HTTP server (reusing the router port) or detached on a
dedicated port.

**Prometheus runtime**
The PrometheusRuntime enum encapsulates two modes:
- `Attached` - metrics are served from the same HTTP server on a
configurable path (default `/metrics`).
- `Detached` - if a distinct port is configured, the runtime spawns a
dedicated `ntex::HttpServer` with its own lifecycle. I limited the
server to a **single worker**.

**Endpoint conflict detection**
Because metrics may live under `/metrics` or another path, the router
now encapsulates paths into a `RouterPaths` struct. On start‑up it
verifies that GraphQL, health, readiness and Prometheus endpoints do not
collide and returns a `RouterInitError::EndpointConflict` if they do.

**Request context storage**
A new `pipeline::request_extensions` module defines small structs stored
in `HttpRequest` extensions. These hold the raw body size, the GraphQL
operation name and type, and the final response status. It also exposes
helper functions that are used by different pipeline stages to read and
write data.


## Configuration

Metrics are disabled by default. They are enabled when at least one
exporter is present and marked as enabled.
Each exporter entry has a `kind` field (e.g. `otlp` or `prometheus`) and
optional settings such as `interval`, `temporality` and `protocol`. A
Prometheus exporter can specify a `path` and `port` ("empty" means reuse
the router port).

The `telemetry.metrics.instrumentation` section allows tuning histogram
aggregation.

A common histogram configuration defines explicit buckets for `bytes`
(request/response body sizes) and `seconds` (durations). The default
buckets are covered in the docs. We do two buckets, one per unit, as
their values and their ranges are completely different.

A per-instrument override lets you disable individual metrics or strip
certain attributes.


## Noteworthy changes

- Caches are now stored in `CacheState`. This centralisation makes it
easier to manage cache invalidation on supergraph reloads and to
register metrics observers.
- Cache reads and writes are now standardised. I used
`.entry().or_try_insert_with()` API of moka everywhere, to guarantee
that concurrent calls on the same not-existing entry are coalesced into
one evaluation. We used to do `get().await` and then `insert().await` in
some cases, but now we have minimum cache writes and maximum reusability
of just inserted entries.
- I also created `EntryResultHitMissExt` trait that extends Moka's Entry
API, to give us insights into cache hits, misses and errors. Useful for
metrics and spans attempting to record cache hits/misses.

## Example config

```yaml
telemetry:
  metrics:
    exporters:
      - kind: prometheus       # enable the Prometheus exporter
        enabled: true
        path: /metrics         # override the default path
        port: 6969             # spins up a dedicated http server, running
      - kind: otlp
        endpoint: http://otel‑collector:4318
        protocol: http    # or grpc
        interval: 30s     # push every 30s
        temporality: cumulative
    instrumentation:
      common:
        histogram:
          aggregation: explicit # or exponential
          bytes:
            buckets: [128, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 3145728, 4194304, 5242880]
            record_min_max: false
          seconds:
            buckets: [0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1, 2.5, 5, 7.5, 10]
            record_min_max: false
      instruments:
        http_server_request_duration_seconds:
          attributes:
            graphql.operation.name: false    # drop operation name label, as it has high cardinality
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on:router-release Do not merge: waiting for Router release that includes this feature.

Development

Successfully merging this pull request may close these issues.

2 participants