Skip to content

[FEA]: nemo_retriever library: Support storing extracted images to disk #1675

@randerzander

Description

@randerzander

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Currently preventing usage

Please provide a clear description of problem this feature solves

Like the old nv_ingest API, nemo_retriever library needs a .store() task which supports saving to disk:

  1. full page images
  2. sub-page images (cropped images of recognized tables, charts, infographics, etc)

Describe the feature, and optionally a solution or implementation and any alternatives

It should use the fsspec package so that users can have these images persisted to cloud/other distributed storage if desired.

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions