Skip to content

1 rewrite of scan batch dir#2

Open
bdgregg wants to merge 19 commits intomasterfrom
1-rewrite-of-scan-batch-dir
Open

1 rewrite of scan batch dir#2
bdgregg wants to merge 19 commits intomasterfrom
1-rewrite-of-scan-batch-dir

Conversation

@bdgregg
Copy link
Contributor

@bdgregg bdgregg commented Feb 26, 2026

This PR addresses issue #1.

This is a full rewrite of the original scan-batch-dir script to allow it to be more modular so that changes needed in the future should be easier to implement. This also added the ability to use PDF files as newspaper issues.

@bdgregg bdgregg self-assigned this Feb 26, 2026
@bdgregg bdgregg linked an issue Feb 26, 2026 that may be closed by this pull request
Copy link
Member

@ctgraham ctgraham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial readthrough with some questions and suggestions.

logger.info(f"File is model: {imodel}, TID: {field_model}")

# Process any .tif files.
if (file_ext.lower() == ".tif"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Special processing by type would be a good candidate to break out into separate functions for readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern of "Handle top level files" ... "Build row data" is also heavily repeated here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with separate functions for readability. This would probably be a next step as I was building these out as I went along.

'resouce_type': 'Text',
'child': 'File',
},
'Publication Issue 1': {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can "1" and "2" be given semantically meaningful names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to, have any suggestions? Maybe "Publication Issue Paged" vs "Publication Issue PDF"? Not sure if MAD would want different names.

"field_weight","field_model","model","field_resource_type","transcript"]

# Global file patterns to skip over.
globals()['skip'] = ["ignore",".jp2",".metadata","meta",".opex",".fits",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these skip patterns documented outside of this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not yet. Was thinking on adding the list to the config file to allow customization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite of scan-batch-dir

2 participants