Enhance print-metadata-fields to group by root JSON and modalities#473
Open
karl-koschutnig wants to merge 5 commits intoPennLINC:mainfrom
Open
Enhance print-metadata-fields to group by root JSON and modalities#473karl-koschutnig wants to merge 5 commits intoPennLINC:mainfrom
karl-koschutnig wants to merge 5 commits intoPennLINC:mainfrom
Conversation
…tional BIDS datasets
… unique ones The previous implementation filtered to show only fields unique to each category, which caused common fields like 'Manufacturer' to disappear from all modalities. This fixes test failures in test_remove_fields and test_print_metadata_fields_command_with_test_dataset. Now all fields that exist in each modality are shown, which is the expected behavior. Fields are still unique within each modality (no duplication), but fields that appear in multiple modalities are now correctly shown in each one.
The test was pointing to the wrong directory. The get_data() function copies cubids/tests/data/ which contains multiple BIDS datasets (BIDS_Dataset, complete, inconsistent, etc). The test needs to specify which dataset to use, like other tests do with data_root / 'complete'. This fixes the test failure where metadata_fields was empty because no subjects were found at the root level.
The code was incorrectly searching for '_events.json' in root_fields dictionary
keys (which are filenames like 'dataset_description.json'). While BIDS does allow
task-level files at root (e.g., task-rest_events.json), these are already correctly
captured by the bids_path.glob('*.json') operation and stored in root_fields with
their full filename as the key.
The removed code was:
- Searching in the wrong structure (dictionary keys instead of file contents)
- Using incorrect pattern matching ('_events.json' in filename)
- Completely redundant as root-level JSON files are already properly collected
Addresses feedback from cursor bot review.
| for mod, fields in modalities.items(): | ||
| result[mod] = sorted(fields) | ||
|
|
||
| return result |
There was a problem hiding this comment.
Bug: Test Fails to Detect Field Removal
The test_remove_fields no longer correctly verifies field removal. The get_all_metadata_fields() method now returns a dictionary, but the test's assertion set(new_fields) incorrectly creates a set of dictionary keys (filenames/modality names) instead of metadata field names. This causes the intersection check to always pass, making the test report success even if fields were not removed.
Additional Locations (1)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
This PR improves the output of the cubids print-metadata-fields command to provide a cleaner, more organized view of metadata fields in a BIDS dataset. Instead of listing all fields from every JSON file individually (which can be overwhelming ), the command now groups fields as follows:
Root-level JSON files (e.g., dataset_description.json, participants.json) are listed with all their metadata fields.
Modalities (e.g., func, anat, dwi) group all nested JSON files by type and display only the metadata fields unique to that modality (fields that don't appear in other categories).
This reduces redundancy—fields like EchoTime that appear across many files are shown only once under their respective modality, avoiding clutter in the output.
Changes Made:
Modified get_all_metadata_fields() in cubids.py to collect and categorize fields by root files and modalities, filtering for uniqueness.
Updated print_metadata_fields() in workflows.py to display the grouped output.
Example Output (Before):
AcquisitionTime
BandwidthPerPixelPhaseEncode
BodyPartExamined
CoilString
ConversionSoftware
ConversionSoftwareVersion
DerivedVendorReportedEchoSpacing
DeviceSerialNumber
EchoTime
EchoTrainLength
EffectiveEchoSpacing
Example Output (After):
dataset_description.json
Acknowledgements
Authors
...
func
EchoTime
RepetitionTime
TaskName
...
participants.json
age
sex
...
This enhancement makes it easier to get an overview of metadata structure without duplication, addressing user feedback for a more structured and unique field listing.
Best, Karl