Skip to content

Latest commit

 

History

History
217 lines (170 loc) · 4.35 KB

File metadata and controls

217 lines (170 loc) · 4.35 KB

Vector Database Retrieval API

Pylint

Monitoring out of the box for API and Database

Middleware pipes FastAPI telemetry to Prometheus and Grafana. Grafana dashboard can be uploaded from infra/grafana/dashboard.json to see API metrics with no additional work.

Adapted from https://grafana.com/grafana/dashboards/16110-fastapi-observability/

Running and Debugging

Generally you can use commands from /scripts or code from /tests to isolate the issue. The DB can be run in isolation, but the API depends on the DB. Visit http://localhost:8080/docs when the service is up to view FastAPI docs.

Build containers with code snapshot

bash scripts/build_container.sh

Run the service

docker compose up

Run with hot reloads for API code (slow)

docker compose watch retrieval_service

Attach shell to the API service

docker compose exec retrieval_service /bin/bash 

Manually populate test database (Rerunning duplicates data entries)

docker compose exec retrieval_service \
    /bin/bash -c "source scripts/api_debug_setup.sh"

Reset database values

Data persists between sessions. Delete data like,

bash scripts/wipe_database.sh

Warning: This will wipe real data too! Only use on test runs.

Tear down

docker compose down

Monitor or restart services

docker compose ps 
docker compose restart retrieval_service

Dev environment set up

Prereqs:

  • Docker
  • Conda
conda env create -f environment.yaml
conda activate poetry_env
poetry install

Updating Python Dependencies

Add the dependency change to pyproject.toml then update the poetry.lock file like so:

conda activate poetry_env
poetry update

Finally, rebuild the Docker containers.

Ping Service

Single embedding

curl -X 'POST' \
  'http://localhost:8080/similar' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "embedding": [
    0.5,
    0.2,
    -0.1
  ],
  "k": 50,
  "metric": "cosine_distance"
}'

Supported metrics are max_inner_product, cosine_distance, and L2_distance.

Bulk inference

curl -X 'POST' \
  'http://localhost:8080/bulk_similar' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "embedding_list": [
    [0.5,0.2,-0.1],
    [0.1,0.89,-0.21]
    ],
  "k": 2,
  "metric": "cosine_distance"
}' \
| jq .

output:

{
  "most_similar": [
    [
      {
        "name": "sherbert",
        "embedding": [
          11.0,
          2.0,
          3.0
        ],
        "distance": 0.11676757169186391
      },
      {
        "name": "kuma",
        "embedding": [
          1.0,
          6.0,
          3.0
        ],
        "distance": 0.6231326316557115
      }
    ],
    [
      {
        "name": "kuma",
        "embedding": [
          1.0,
          6.0,
          3.0
        ],
        "distance": 0.22904390253150264
      },
      {
        "name": "mike",
        "embedding": [
          1.0,
          2.0,
          3.0
        ],
        "distance": 0.6368304002307885
      }
    ]
  ]
}

Load testing

To ping in bulk for testing,

python scripts/spam_requests.py --spam_seconds=10

Monitoring

Metrics from the Retrieval API are scraped by Prometheus and can be visualized in Grafana.

Links:

Existing features

  • Vector Database via custom extension on Postgres
  • API for querying the nearest neighbors
  • Limited upload support
  • Metrics/Monitoring dash via Prometheus/Grafana
    • Dashboard for API and Database
  • Bulk inference

Future work