Skip to content

Releases: Particle1904/DatasetHelpers

v2.9.10

22 Feb 03:07

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

Fixes:

  • Fixed the Cancel button in Resize Page not working.
  • Improved image resizing logic and DPID performance.
  • Fixed issues with Editor sometimes not saving tags/captions and sometimes saving for the wrong image.
  • Other UI fixes and improvements.

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 9.0.0 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.8.x-3.13.x (tested against 3.10) to use Gemini Captioning

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.9

23 Jan 07:40

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

Fixes:

  • Fixed a bug with Inpaint Canvas not handling resolutions properly.
  • Massively improved performance of Inpaint Canvas.
  • Fix issue with Florence2 Pipeline and any feature relying on it so it no longer freezes the UI.
  • Failing to Append Execution providers should now properly fallback to CPU.

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 9.0.0 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.8.x-3.13.x (tested against 3.10) to use Gemini Captioning

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.8

20 Jan 09:36

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

New Features:

  • Image Resizing:
    • Added DPID resampler for downscaling very large images (now enabled by default).
    • Added Resampler GUI controls and enabled dynamic resizing concurrency.
  • Inpainting & Editing:
    • Improved Inpaint canvas and added brush hardness control.
    • All saved images now default to lossless .webp format to reduce overall size of datasets.
    • Developer Notes: I've been working on a dataset with 15k images and I noticed that .png was just wasting SSD/HDD storage; Text-Image models often discard the alpha channel anyway. If you have a case for a selectable output format, let me know!
  • Gemini & AI Updates:
    • Updated Gemini Captioning to use gemini-3-pro-preview. Gemini Captioning is currently not available because generativeai Python Lib was deprecated in December 16 2025; I haven't had the time to migrate to the new lib yet.
    • ModelRunner (Florence 2 Pipeline) now supports GPU usage.
    • Metadata Viewer now defaults to WD3Large model.
    • General improvements to Gemini and Python services.
  • Gallery & UI:
    • Improved Gallery loading feature to better handle high-resolution images.
    • Changed Notification sounds to use NetCoreAudio (removing LibVLC dependency for audio).

Fixes:

  • Stability & Performance:
    • Fixed multiple UI blocking calls to improve application responsiveness.
    • Attempted to improve CTD (Crash to Desktop) logging.
    • Implemented a more efficient way to read image sizes.
  • Tool-Specific Fixes:
    • Editor: Fixed an issue where the Editor page wouldn't save changes.
    • Text Remover: Fixed a bug where the tool would only work the first time it was run.
    • File Manager: Fixed OpenFailedFolder button opening the wrong directory.
  • Gallery: Fixed bugs in Gallery View and improved general performance.
  • Attempted to resolved issue where input was captured when the app was unfocused.

Others:

  • Bumped OnnxRuntime version to 1.22.1.
  • Re-added OnnxRuntime.Extensions.
  • Updated NuGet packages.
  • General code cleanup and UI adjustments.

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 9.0.0 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.8.x-3.13.x (tested against 3.10) to use Gemini Captioning

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.7

28 Jul 16:14

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

New Features:

  • Added overlap and blending for the Tiled Inpainting feature.
  • Added global exception handlers.

Fixes:

  • Numerous UI fixes and small adjustments.

Others:

  • Discontinued support for Z3DE621 model. #58
  • Downgraded OnnxRuntime version from 1.22 to 1.18. #60

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 9.0.0 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.10.x to use Gemini Captioning

Linux specific requirements:

Follow this installation guide from the LibVLCSharp github documentation:
- sudo apt update
- sudo apt install vlc libvlc-dev
- sudo apt install vlc

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.6

14 Jun 07:31

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

Fixes:

  • Addressed a major crash and improved the stability of the automatic Text Removal pipeline by reducing the overall memory footprint.
  • Resolved an application crash triggered by using the middle mouse button.
  • Fixed a bug affecting the Extract Subset page.
  • Corrected an improper ImageSharp library reference.

Others:

  • Updated the descriptive note in the Text Remover UI.
  • Updated various internal dependencies to their latest versions.
  • General code refactoring and clean-ups for improved stability and maintenance.

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 9.0.0 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.10.x to use Gemini Captioning

Linux specific requirements:

Follow this installation guide from the LibVLCSharp github documentation:
- sudo apt update
- sudo apt install vlc libvlc-dev
- sudo apt install vlc

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.5

27 May 00:40

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

New Features:

  • Segment Anything 2 (SAM2) Support:
    • Adds point/box inference for object segmentation.
  • Florence2 Integration:
    • Supports captioning, semantic object detection, and OCR with region targeting.
    • Enables automatic watermark/text/logo removal in conjunction with SAM2 and LaMa (inpainting).
    • CPU inference only at this moment, GPU planned for the future; I couldn't properly debug it using DirectML and an Integrated GPU.
    • Developer Note: Florence2 captioning was implemented per this request, but Gemini remains superior for captioning. Florence2’s real value lies in its semantic and OCR capabilities.
  • Automatic Text/Watermark/Logo Removal (Experimental):
    • Uses Florence2 + SAM2 + LaMa pipeline.
    • Developer Note: This feature can produce false positives. Avoid running it on clean images or those with text tattoos. A refinement mechanism is planned for the next release.
  • New ModelManagerService:
    • Splits responsibilities from FileManagerService.
    • Manages model and dependency downloads (e.g., .txt, .csv files).

Fixes:

  • Resolved issue with user input being captured while the app was minimized or unfocused.
  • Numerous UI fixes and small adjustments.

Others:

  • Updated SharpHook to v6.1.0.
  • Note: Configuration options for Florence 2 Caption and Text Remover UI pages are not available in this release. You may still configure them via config.json.

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 9.0.0 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.10.x to use Gemini Captioning

Linux specific requirements:

Follow this installation guide from the LibVLCSharp github documentation:
- sudo apt update
- sudo apt install vlc libvlc-dev
- sudo apt install vlc

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.4

10 May 09:17

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

Fixes:

  • Fixed the Sort page from creating unnecessary backups and placing them in incorrect folders.
  • Resolved an issue where a UI element was playing notifications unexpectedly.
  • Image Processor now saves images in ".png" format.
  • Improved the Gemini Captioner to handle Python interop more gracefully, no longer initializes a Python instance if one already exists.
  • Gemini Captioner responses are now formatted to remove unexpected newlines.
  • Various small UI improvements.

Others:

  • Updated dependencies.
  • Upgraded from .NET 8.0 to 9.0.

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 9.0.0 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.10.x to use Gemini Captioning

Linux specific requirements:

Follow this installation guide from the LibVLCSharp github documentation:
- sudo apt update
- sudo apt install vlc libvlc-dev
- sudo apt install vlc

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.3

21 Jan 11:43

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

New Features:

  • Gemini Guided Captioning:
    • Added Gemini Captioning feature that uses the official Python package.
      - Developer notes: this feature requires Python 3.10.x installed (with PATH environment variables) since its using the Python library to generate the captions; the official REST API is limited when it comes to content filtering and blocking.
    • It will use existing tags in .txt files if available to help guide the Gemini Captions.
  • Extended user settings for the Process Tags page based on user feedback

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 8.0.2 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model
Python 3.10.x to use Gemini Captioning

Linux specific requirements:

Follow this installation guide from the LibVLCSharp github documentation:
- sudo apt update
- sudo apt install vlc libvlc-dev
- sudo apt install vlc

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.2

04 Jan 17:58

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

Fixes:

  • Changed it so the Logs Panel won't disturb the user anymore with auto showing itself all the time. Logs panel will only automatically show when a log of type Error happens; Warnings and Info will only play a notification sound.
  • Changed the default resolution settings in all pages to 1024x1024.

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 8.0.2 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!

v2.9.1

03 Jan 04:42

Choose a tag to compare

Dataset Processor Tools is a comprehensive set of tools designed for processing image datasets for machine learning. With these tools, you can easily discard low-resolution images, resize images while preserving their aspect ratio, generate tags using a pre-trained model, mass edit .txt files with tags, and even manually edit .txt files with ease.

The download comes bundled with all the necessary dependencies, including OnnxRuntime .dll/.so files, to ensure a seamless experience. Please note that some of these files can be quite large due to their importance in the functionality of the tools.

For detailed instructions and examples, be sure to visit the Wiki page. You'll find comprehensive guidance on how to make the most of Dataset Processor Tools.

Fixes:

  • Changed it so the tools will now automatically download the LaMa.onnx file when trying to use the Inpaint feature.
  • Increased the image size in the Inpaint page.
  • Small fixes in the interface layout in the Inpaint page; changing the current selected image now focuses on the Image control to avoid unnecessary mouse scrolling.
  • Changed the default resolution in the Resize page from 512x512 to 1024x1024.
  • The application can't be closed anymore when a model file is downloading.
  • Various changes to the Logs panel:
    • The Logs panel is now a hideable flyout.
    • Downloading files now informs the download progress in the Logs panel.
    • The Logs panel will now play a notification sound and automatically show the panel when a new Log event happens.
      Developer notes: notification sounds requires the LibVLCSharp runtime, Linux users needs to follow this installation guide from the LibVLCSharp github documentation! As always, create a new issue if you encounter issues with Linux.
      • For Linux users:
        • sudo apt update
        • sudo apt install vlc libvlc-dev

Requirements:

This software requires the following runtimes or newest versions, so be sure to install them:

.NET Desktop Runtime 8.0.2 or newer
For Windows users: Visual C++ Redistributable for Visual Studio 2019 for running the Model

For MAC users please check this Issue on how to build yourself, don't forget to read my comment!