Skip to content
Jade Norindr edited this page Jul 28, 2023 · 2 revisions

What is extractorAPI?

extractorAPI was developed as a module for the EiDA/VHS application, an application built for object detection in images of manuscripts and printed books. For more flexibility, the detection model is not integrated directly in the application but in an independent API.

The separation between the app and the detection algorithm allows for the use of a GPU for the computer vision steps of the workflow. This API was built to be made reusable by other projects which may want to launch detection on a GPU from an application and retrieve automatically the annotations.

Workflow

Screenshot from 2023-07-28 14-34-36

  • A POST request containing the URL of a IIIF manifest is received from the application (or a curl request) ;
  • Each individual image is downloaded on the GPU ;
  • Detection is launched: the model parses through each image ;
  • A text file is created containing a line per image with the coordinates of the detected objects ;
  • The text file is sent to the application through a POST request.

Structure of the repository

  • app.py: API endpoints and detection functions
  • /iiif: functions to download images from a IIIF manifest, adapted from iiif-downloader
  • /yolov5: detection algorithm adapted from YOLOv5
  • /utils: utilitarian functions, classes and paths
  • .env: secret file for environment variables

Functioning of the API

The API was developed using Flask as its framework. It uses Celery as a task queue and for task scheduling, with Redis as its message broker and backend.

The Celery instance is defined in celery_utils.py and runs detection as a background task: when the requests are received by the API, they are added to the queue. The @celery.task decorator defines the Celery tasks to be run on the background.

The detect function is called by the /run_detect and the /detect_all route, to which requests are sent.

The delete_images function is scheduled to run each day and verify the date of the images saved on the GPU: the images are deleted after a week.

Clone this wiki locally