Vector Database Retrieval API

Monitoring out of the box for API and Database

Middleware pipes FastAPI telemetry to Prometheus and Grafana. Grafana dashboard can be uploaded from infra/grafana/dashboard.json to see API metrics with no additional work.

Adapted from https://grafana.com/grafana/dashboards/16110-fastapi-observability/

Running and Debugging

Generally you can use commands from /scripts or code from /tests to isolate the issue. The DB can be run in isolation, but the API depends on the DB. Visit http://localhost:8080/docs when the service is up to view FastAPI docs.

Build containers with code snapshot

bash scripts/build_container.sh

Run the service

docker compose up

Run with hot reloads for API code (slow)

docker compose watch retrieval_service

Attach shell to the API service

docker compose exec retrieval_service /bin/bash

Manually populate test database (Rerunning duplicates data entries)

docker compose exec retrieval_service \
    /bin/bash -c "source scripts/api_debug_setup.sh"

Reset database values

Data persists between sessions. Delete data like,

bash scripts/wipe_database.sh

Warning: This will wipe real data too! Only use on test runs.

Tear down

docker compose down

Monitor or restart services

docker compose ps 
docker compose restart retrieval_service

Dev environment set up

Prereqs:

Docker
Conda

conda env create -f environment.yaml
conda activate poetry_env
poetry install

Updating Python Dependencies

Add the dependency change to pyproject.toml then update the poetry.lock file like so:

conda activate poetry_env
poetry update

Finally, rebuild the Docker containers.

Ping Service

Single embedding

curl -X 'POST' \
  'http://localhost:8080/similar' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "embedding": [
    0.5,
    0.2,
    -0.1
  ],
  "k": 50,
  "metric": "cosine_distance"
}'

Supported metrics are max_inner_product, cosine_distance, and L2_distance.

Bulk inference

curl -X 'POST' \
  'http://localhost:8080/bulk_similar' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "embedding_list": [
    [0.5,0.2,-0.1],
    [0.1,0.89,-0.21]
    ],
  "k": 2,
  "metric": "cosine_distance"
}' \
| jq .

output:

{
  "most_similar": [
    [
      {
        "name": "sherbert",
        "embedding": [
          11.0,
          2.0,
          3.0
        ],
        "distance": 0.11676757169186391
      },
      {
        "name": "kuma",
        "embedding": [
          1.0,
          6.0,
          3.0
        ],
        "distance": 0.6231326316557115
      }
    ],
    [
      {
        "name": "kuma",
        "embedding": [
          1.0,
          6.0,
          3.0
        ],
        "distance": 0.22904390253150264
      },
      {
        "name": "mike",
        "embedding": [
          1.0,
          2.0,
          3.0
        ],
        "distance": 0.6368304002307885
      }
    ]
  ]
}

Load testing

To ping in bulk for testing,

python scripts/spam_requests.py --spam_seconds=10

Monitoring

Metrics from the Retrieval API are scraped by Prometheus and can be visualized in Grafana.

Links:

Prometheus targets: http://localhost:9090/targets
Supported metrics via Retrieval App: http://localhost:8080/metrics/
Prometheus: http://localhost:9090/config
Grafana: http://localhost:3000/explore

Existing features

Vector Database via custom extension on Postgres
API for querying the nearest neighbors
Limited upload support
Metrics/Monitoring dash via Prometheus/Grafana
- Dashboard for API and Database
Bulk inference

Future work

Upload item to vector DB in api call
- Upload from file?
Integrate with meaningful embeddings
Move embedding size from static file to ENV
Indexing to avoid brute force search
- https://github.com/pgvector/pgvector?tab=readme-ov-file#indexing
Hot reloads for development. Current unreliable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector Database Retrieval API

Monitoring out of the box for API and Database

Running and Debugging

Build containers with code snapshot

Run the service

Run with hot reloads for API code (slow)

Attach shell to the API service

Manually populate test database (Rerunning duplicates data entries)

Reset database values

Tear down

Monitor or restart services

Dev environment set up

Updating Python Dependencies

Ping Service

Single embedding

Bulk inference

Load testing

Monitoring

Existing features

Future work

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vector Database Retrieval API

Monitoring out of the box for API and Database

Running and Debugging

Build containers with code snapshot

Run the service

Run with hot reloads for API code (slow)

Attach shell to the API service

Manually populate test database (Rerunning duplicates data entries)

Reset database values

Tear down

Monitor or restart services

Dev environment set up

Updating Python Dependencies

Ping Service

Single embedding

Bulk inference

Load testing

Monitoring

Existing features

Future work