Releases: Aatricks/LightDiffusion-Next
Releases · Aatricks/LightDiffusion-Next
V2.1.4beta1
What's Changed
- New minimalistic UI and added testing by @Aatricks in #20
- Feat/optimizations stability by @Aatricks in #21
Full Changelog: V2.1.3...V2.1.4beta1
V2.1.3
What's changed
- NVFP4 (4-bit) Weight-Only Quantization Support by @Aatricks
- Implementation of 4-bit quantization for ~75% reduction in weight memory usage.
- Integrated support for Flux2 (Transformer + Klein Text Encoder), SDXL, and SD1.5 architectures.
- Optimized runtime dequantization to FP16/BF16 during the forward pass via comfy_cast_weights.
- Automated layer selection targeting weights >4096 elements to balance compression and quality.
- Added weight_quantization configuration to API, Context, and Pipeline for granular memory
control.
Full Changelog: V2.1.2...V2.1.3
V2.1.2
What's Changed
- Fix: reload base model before HiresFix + stability & UX improvements (Flux2, VAE, ADetailer, settings history) by @Aatricks in #19
- Pipeline Stability & HiresFix Improvements by @Aatricks
- Fixed critical race conditions in VAE encoding during SDXL+Refiner workflows by enforcing blocking transfers.
- Implemented logic to reload the base model before HiresFix when a refiner is used, preventing latent corruption.
- Ensured ADetailer explicitly uses the base model instead of the refiner for text-guided crop enhancements.
- Reverted non-essential SDXL changes to resolve regressions in Attention and conditioning modules.
- Settings Persistence & History Management by @Aatricks
- Implemented backend storage for settings history and last-used seeds.
- Added a collapsible "Settings History" section to the UI for quick restoration of previous configurations.
- Integrated image import functionality directly into the GenerationSettings component.
- Flux.2 & Core Model Optimization by @Aatricks
- Enhanced
torch.compileintegration with support for callables and improved logging. - Fixed RoPE feature dimension alignment and added padding adjustments for Flux2.
- Improved FP8 quantization fallback logic for models lacking diffusion submodules.
- Added validation for model file existence (safetensors/pt) in the downloader.
- Enhanced
- Image Processing & Batch Limits by @Aatricks
- Introduced
LD_MAX_IMAGES_PER_GROUPto control processing limits and implemented chunking for large pipeline requests. - Updated AutoHDR to properly handle RGBA images (preserving alpha) and added fallbacks for missing LCMS or failed ICC transforms.
- Added telemetry for batch limit configuration.
- Introduced
- Testing & Infrastructure by @Aatricks
- Restructured the test suite into clear
e2e,integration, andunitcategories. - Fixed frontend runtime crashes by importing missing Button and ImageMetadata types.
- Restructured the test suite into clear
Full Changelog: V2.1.1...V2.1.2
V2.1.1
What's changed
- Enhanced Preview and Message Management by @Aatricks
- Added generation ID handling to improve preview message tracking and management.
- Implemented configurable preview fidelity settings (format and quality) in AppInstance.
- Model Compilation and Image Processing Optimizations by @Aatricks
- Updated compile_model to default to 'max-autotune-no-cudagraphs' for better performance and stability.
- Introduced in-memory image byte storage in ImageSaver to reduce disk I/O during API responses.
- Added color utility functions for linear to sRGB conversion and Reinhard tonemapping.
- Improved HiresFix and SDXL Support by @Aatricks
- Enhanced HiresFix with support for size conditioning specific to SDXL models.
- Refined handling of denoise and CFG parameters during the upscaling process.
- Comprehensive Img2Img Enhancements by @Aatricks
- Expanded API to support image uploads via local file paths, data URLs, and raw Base64 strings.
- Implemented robust image saving with automatic format conversion and size limit enforcement.
- Added a request filename prefix feature for improved output file organization.
- Pipeline Robustness and Bug Fixes by @Aatricks
- Enhanced tensor handling in sampling utilities and the multiscale manager for better stability.
- Fixed refiner prompt usage for per-sample HiresFix/Adetailer when using SDXL or Flux models.
- Added missing flux2 tokenizer merges configuration.
- Improved error handling for non-tensor outputs in VAE and Pipeline modules.
- Tooling and Distribution Updates by @Aatricks
- Added a dedicated downloader for Flux models to streamline setup.
- Included the frontend dist folder in the repository.
- Expanded Integration and Unit Testing by @Aatricks
- Added tests for FP8 quantization and torch.compile compatibility.
- Introduced comprehensive integration tests for batched processing and high-payload img2img requests.
Full Changelog: 2.1.0...V2.1.1
2.1.0
V2.0.0
What's Changed
- Full Flux.2 Klein 4B Distilled Support by @Aatricks in
#17- Implementation of the Flux2 transformer and Qwen-based text encoder.
- Optimized text conditioning with attention masks, text normalization, and
vector input support. - Resolution-dependent timestep scheduling and model sampling shift
configurations. - Aggressive VRAM management and partial model loading for consumer hardware.
- Fixed positional embeddings and VAE decoding logic specific to the Klein
architecture.
- SDPA Backend Priority Management (SpargeAttn > SageAttention > Xformers) by
@Aatricks - Enhanced SDXL Condition Processing for improved prompt adherence by @Aatricks
- Optimized Model Reuse Logic to prevent redundant device transfers and reloads by
@Aatricks - Streamlined CI Workflow with improved error reporting and expanded
unit/integration tests by @Aatricks - Revamped Streamlit UI for synchronized resolution management and Flux2 presets by
@Aatricks - Fixed random seed generation to comply with PyTorch limits across all modules by
@Aatricks - Improved Img2Img upscale logic and dimension handling for DiT models by @Aatricks
Full Changelog: V1.9.1...V2.0.0
V1.9.1
What's Changed
- Smart cfg by @Aatricks in #11
- ROCm and mps by @Aatricks in #13
- Avoid unnecessary GGUF model unloading after patches by @google-labs-jules[bot] in #16
- Optimize mmap release logic in Quantizer by @google-labs-jules[bot] in #15
- Dynamic Width Scaling in Condition Encoding by @google-labs-jules[bot] in #14
- Various optimizations to reduce device transfers by @Aatricks
- Implemented calculations caching and batching for flux and attention by @Aatricks
- Vectorized tensor indexing for schedulers by @Aatricks
- Implemented dynamic vae tiling based on available VRAM by @Aatricks
- These various improvements lead to about 30% inference speed improvements in SD1.5 scenarios
New Contributors
- @google-labs-jules[bot] made their first contribution in #16
Full Changelog: V1.9.0...V1.9.1