Skip to content

fix: filter concept uploads to training-relevant files only#1406

Open
BitcrushedHeart wants to merge 1 commit intoNerogar:masterfrom
BitcrushedHeart:fix/cloud-upload-filtering
Open

fix: filter concept uploads to training-relevant files only#1406
BitcrushedHeart wants to merge 1 commit intoNerogar:masterfrom
BitcrushedHeart:fix/cloud-upload-filtering

Conversation

@BitcrushedHeart
Copy link
Copy Markdown
Contributor

Summary

  • Cloud concept uploads previously transferred every file and directory indiscriminately, including hidden directories (.thumbnails, .trash, .index, etc.) and non-training files (archives, databases, etc.)
  • With large dataset folders this causes uploads to appear stuck at extremely low throughput due to per-file SCP/SFTP overhead on hundreds of thousands of irrelevant files
  • Concept uploads now skip hidden directories and only transfer files with supported image/video extensions and .txt caption files

Files changed

  • modules/cloud/BaseSSHFileSync.py - sync_up_dir() gains optional skip_hidden and allowed_extensions parameters
  • modules/cloud/BaseFileSync.py - updated abstract signature to match
  • modules/cloud/BaseCloud.py - concept uploads pass skip_hidden=True and the set of training-relevant extensions

Cloud concept uploads previously transferred every file and directory
indiscriminately, including hidden directories like .thumbnails and
.trash, plus non-training files such as archives. This caused uploads
to appear stuck due to massive per-file SCP/SFTP overhead on hundreds
of thousands of irrelevant files.

Concept uploads now skip hidden directories and only transfer files
with supported image/video extensions and .txt caption files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant