Skip to content

AwkJay/fastapi-pagination-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fastapi-pagination-api

Here, I have implemented cursor-based pagination properly, not the watered-down OFFSET version you see in most tutorials, but actual keyset pagination that holds up at scale.

If you've ever had a query like SELECT * FROM items OFFSET 50000 LIMIT 20 crawl to a halt on a large table, this is the fix for that.


what's inside

  • cursor-based pagination using (created_at, id) as the keyset
  • filtering by status and tag via SQL WHERE clauses
  • composite indexes on the columns actually used for filtering
  • async SQLAlchemy 2.0 with asyncpg (no sync sessions anywhere)
  • 13 tests that run against SQLite — no postgres needed for the test suite
  • docker setup that actually works first try (mostly)

running it

the easy way — Docker:

git clone https://github.com/AwkJay/fastapi-pagination-api.git
cd fastapi-pagination-api
cp .env.example .env
docker-compose up --build

Once both containers are up, the API is at http://localhost:8000.
Interactive docs (Swagger UI) at http://localhost:8000/docs — easier than curl for manual testing.

if you already have postgres running locally on port 5432, you'll hit a port conflict. change the postgres ports in docker-compose.yml to "5433:5432" and run docker-compose down && docker-compose up --build.

without Docker:

pip install -r requirements.txt
# set DATABASE_URL in .env to point at your local postgres
uvicorn app.main:app --reload

tests

pytest tests/ -v

Uses SQLite + aiosqlite so there's nothing to set up. All 13 pass in ~1.3s.

The tricky one was test_second_page_no_overlap — SQLite resolves timestamps to the second, so items created in a fast loop all get the same created_at. Fixed by injecting explicit spaced timestamps in the test data rather than fighting the clock.


API

create an item

curl -X POST "http://localhost:8000/api/v1/items" \
  -H "Content-Type: application/json" \
  -d '{"name": "some item", "status": "active", "tag": "python"}'

first page

curl "http://localhost:8000/api/v1/items?limit=10"

next page — take next_cursor from the previous response and pass it back:

curl "http://localhost:8000/api/v1/items?limit=10&cursor=eyJ0cyI6..."

filter — note the quotes around the URL, & is special in bash without them:

curl "http://localhost:8000/api/v1/items?status=active&tag=python"

single item

curl "http://localhost:8000/api/v1/items/{id}"

health

curl "http://localhost:8000/api/v1/health"

API

create an item

curl -X POST "http://localhost:8000/api/v1/items" \
  -H "Content-Type: application/json" \
  -d '{"name": "some item", "status": "active", "tag": "python"}'

first page

curl "http://localhost:8000/api/v1/items?limit=10"

next page — take next_cursor from the previous response and pass it back:

curl "http://localhost:8000/api/v1/items?limit=10&cursor=eyJ0cyI6..."

filter — note the quotes around the URL, & is special in bash without them:

curl "http://localhost:8000/api/v1/items?status=active&tag=python"

single item

curl "http://localhost:8000/api/v1/items/{id}"

health

curl "http://localhost:8000/api/v1/health"

how the cursor works

OFFSET pagination tells the DB "skip the first N rows, give me the next 20." The problem is the DB still has to scan and count those N rows every time, so page 500 is 500x slower than page 1. On a table with a million rows this gets bad fast.

Cursor pagination instead encodes the position of the last item you saw — in this case the created_at timestamp and the id as a tiebreaker, packed into a base64 string. The next query becomes:

WHERE (created_at, id) < ('2026-01-15T10:30:00', '3f2504e0-...')
ORDER BY created_at DESC, id DESC
LIMIT 20

The database hits the ix_item_created_id composite index and jumps straight to that position. No scanning, no counting skipped rows. Performance stays flat whether you're on page 1 or page 10,000.

The tradeoff: you can't jump to "page 47" directly. You have to walk forward. For most real use cases (infinite scroll, APIs, data pipelines) that's fine.


response shape

{
  "items": [
    {
      "id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
      "name": "example item",
      "status": "active",
      "tag": "python",
      "score": 4.2,
      "created_at": "2026-01-15T10:30:00+00:00"
    }
  ],
  "next_cursor": "eyJ0cyI6ICIyMDI0LTAxLTE1VDEwOjMwOjAwIiwgImlkIjogIjNmMjUwNGUwIn0",
  "has_more": true,
  "total_count": 1500,
  "limit": 10
}

total_count is always the full filtered count, not the page count. It runs as a separate COUNT(*) query with the same WHERE filters applied but without the cursor — so you always know the dataset size regardless of which page you're on.

has_more uses the limit+1 trick: the query fetches one extra row. If it comes back, there's more data and that extra row gets dropped from the response. Cheaper than a second COUNT query.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors