Skip to content

AnnoPage API

Martin Kišš edited this page Jan 8, 2026 · 11 revisions

This documentation page describes the AnnoPage API, which allows clients to batch-process pages using AnnoPage. The entire system consists of four main components:

  1. AnnoPage API – The central API server that receives client requests and manages processing jobs.
  2. Worker – A service that retrieves jobs from AnnoPageAPI and performs the actual page processing using AnnoPage.
  3. AnnoPage – A tool for page processing powered by trained and publicly available models.
  4. Client – A command-line tool that users can employ to communicate with AnnoPageAPI (create jobs, upload data, download result).

Communication between these components is illustrated in the following diagram:

AnnoPageAPI

API

AnnoPage uses an implementation of DocAPI, an API designed for document-processing workflows.
DocAPI provides a complex interface for creating and managing document processing jobs, as well as functionality for managing API keys for both users and workers.
Internally, DocAPI relies on a PostgreSQL database to store information about jobs and API keys.

A typical workflow for processing pages with AnnoPage API includes the following steps:

  1. The client creates a new processing job using AnnoPage API.
  2. The client uploads all necessary data (images, ALTO XML files, metadata, and a configuration file) to AnnoPageAPI.
  3. The worker periodically polls AnnoPageAPI for new processing jobs.
  4. Once it receives a job, the worker downloads all required data from AnnoPage API.
  5. The worker prepares the processing configuration and runs AnnoPage on the provided data.
  6. When processing is complete, the worker collects the results from AnnoPage and packages them into a ZIP archive.
  7. The worker uploads the final result back to AnnoPageAPI and marks the job as completed.
  8. The client downloads the results from AnnoPageAPI once the job is finished.

The AnnoPage API can be started using the annopage_api command (if installed via pip) or by directly running the api/server.py script. The API can be configured using environment variables defined in DocAPI configuration file.

AnnoPage

For details, see the AnnoPage documentation.

Worker

The Worker service retrieves jobs from AnnoPage API and performs the actual page processing using AnnoPage.
It periodically attempts to retrieve a new job from the API. Once a job is received, the worker downloads all required data (images, ALTO XML files, metadata, and configuration), prepares the processing configuration, and executes AnnoPage.
After processing, the worker creates a ZIP archive with the results, uploads it to AnnoPage API and marks the job as completed.

The worker can be launched using annopage_worker command (if installed via pip) or by directly running the api/worker.py script:

annopage_worker \
  --api-url=ANNOPAGE_API_URL \
  --api-key=ANNOPAGE_API_KEY

ANNOPAGE_API_URL is the API server URL, ANNOPAGE_API_KEY is the worker’s API key, and --wait defines the delay (in seconds) between polling attempts for new jobs.

Client

The Client is a command-line tool that allows users to communicate with AnnoPage API. The script creates a job, uploads the data, waits for the processing to complete, and then downloads the final results into the specified output file.

You can run the client using the annopage_client command (if installed via pip) or by directly running the api/client.py script:

annopage_client \
  --api-url=ANNOPAGE_API_URL \
  --api-key=ANNOPAGE_API_KEY \
  --images=/path/to/dir/with/images \
  --alto=/path/to/dir/with/alto_xmls \
  --config=/path/to/config.json \
  --metadata=/path/to/metadata.json \
  --output=/path/to/output.zip

ANNOPAGE_API_URL is the API server URL, ANNOPAGE_API_KEY is the user’s API key, and the remaining arguments are paths to the required directories or files.

You can also list all engines available in AnnoPage API using the following command:

annopage_client \
  --list-engines \
  --api-url=ANNOPAGE_API_URL \
  --api-key=ANNOPAGE_API_KEY

Clone this wiki locally