Skip to content

missing out on the benefit of queue-based ingestion #3

@twrichards

Description

@twrichards

def iterate_images(process_image):

this approach of using the image-loader HTTP endpoints misses out on the benefits of queue-based ingestion and could fall over with giant images (and run sub-optimally for smaller images). Instead, I would suggest...

  1. using the API to compile a list of media IDs (see probably best to use offset rather than since for paging in migration #1 though!), or rather pre-produce the metadata JSON payloads as files, with the mediaId as the filename
  2. give GetObject permission to the source image bucket, iterate the files from Step 1, performing an S3 copy to the `destination' grid's ingestion queue bucket - blitz through those as fast as possible (we should add support for uploadTime S3 metadata on the queue bucket, such that it gets written to the file which ends up in the image bucket after ingestion)
  3. wait for all the images to be ingested, then perform the metadata updates similar to the current script using the JSON files from Step1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions