Skip to content

Try to aggregate lexile information for relevant kids books #94

@mekarpeles

Description

@mekarpeles

Implementation Requirements

We may wish to build a utility which takes an Archive.org query or a list of isbns and attempts to fetch lexile information.

  • Add a command line argument for the query
  • Make the script robust so that if the same isbn appears multiple times, we don't re-call the API
    • We create a set of seen_isbns which maybe gets loaded from the isbns in the log file?
  • Add error handling so if e.g. a json exception occurs, we log the actual error which occurred.
  • We can generate several files:
    • A log (called log.txt) of all attempted isbns, and if the book succeeded or failed (also include the failure, so we can look it up)
      • isbn, error, {error_msg}
      • isbn, success
    • A file containing called results.jsonl entries which succeeded
  • We should have a way to stop the script automatically and update the state file if an unexpected error occurs which we believe may be indicative of being blocked or rate-limited. (i.e not ISBN not found)
  • If we lose our place, we should be able to restart where we left off (without re-calling every other isbn and without clobbering all of the data that we've already collected). We can use the log file and the state file to figure out which isbns to skip and where to re-start if we re-run our script.

Reference Example of a utility like this:
https://github.com/Open-Book-Genome-Project/sequencer/blob/master/pipeline.py

Here's where we can put our code: https://github.com/Open-Book-Genome-Project/sequencer/blob/master/bgp/pipelines/lexile.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions