Skip to content

⚡ Improve Metadata Handling for WSI Readers#1001

Merged
shaneahmed merged 18 commits intodevelopfrom
bug-fix-openslide-read
Mar 10, 2026
Merged

⚡ Improve Metadata Handling for WSI Readers#1001
shaneahmed merged 18 commits intodevelopfrom
bug-fix-openslide-read

Conversation

@shaneahmed
Copy link
Copy Markdown
Member

@shaneahmed shaneahmed commented Feb 20, 2026

Summary

This PR standardises and improves metadata inference across all WSI readers by introducing a unified mechanism for estimating missing objective power and MPP. It updates all major reader implementations (TIFF, DICOM, OpenSlide, JP2, NGFF, fsspec), fixes reader‑selection ordering, and adds extensive tests to validate inference behaviour and warnings. New sample data is included to support expanded DICOM metadata coverage.

🔑 Key Changes

1. Centralised Metadata Inference

  • Introduces WSIReader._estimate_mpp_objective_power() as the shared method for inferring missing objective power and MPP.
  • Removes duplicated inference logic and ensures consistent fallback behaviour across all readers.

2. Unified Metadata Handling Across Readers

All major WSI readers now use the central inference method:

  • TIFFWSIReader
  • DICOMWSIReader
  • OpenSlideWSIReader
  • JP2WSIReader
  • NGFFWSIReader
  • FsspecJsonWSIReader

This ensures consistent behaviour when metadata is missing or partially defined.

3. Improved Reader Selection Logic

  • Adds try_openslide() and updates selection priority so TIFF files are first attempted via OpenSlide.
  • Fixes misclassification issues where TIFF inputs were incorrectly routed to other readers.

4. Expanded and Strengthened Test Coverage

New and updated tests now cover:

  • Missing or partial OME‑TIFF metadata
  • Missing MPP (X/Y)
  • Missing instrument references
  • Warning behaviour when inference is required
  • DICOM metadata with and without optical path information
  • New dicom-2 sample with known objective/MPP values

Assertions have been updated to reflect the new inference logic.

5. Updated Remote Sample Data

  • Replaces CMU-1.dicom.zip with CMU-1-Small-Region.dicom.zip.
  • Adds new dicom-2 sample (JP2K-33003-1.zip) to support metadata‑specific tests.

6. Cleanup and Minor Fixes

  • Corrects import path for TransformedWSIReader.
  • Improves type hints in objective_power2mpp.
  • Normalises ndarray conversion for inferred MPP values.
  • Cleans up mypy issues related to dimension and metadata handling.

This PR resolves Jupyter Notebook 10 – WSI Reading (#998) and KongNet Notebook for MONKEY dataset (#987).

- Try OpenSlide Reader for tiff files first
- Fallback to calculating objective power from mpp
@shaneahmed shaneahmed self-assigned this Feb 20, 2026
@shaneahmed shaneahmed added this to the Release v2.0.0 milestone Feb 20, 2026
@shaneahmed shaneahmed added bug Something isn't working dev tools Changes/Updates in Development tools labels Feb 20, 2026
@shaneahmed shaneahmed requested a review from measty February 20, 2026 12:36
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.53%. Comparing base (e401b71) to head (2521f5f).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #1001   +/-   ##
========================================
  Coverage    99.53%   99.53%           
========================================
  Files           83       83           
  Lines        11353    11397   +44     
  Branches      1493     1499    +6     
========================================
+ Hits         11300    11344   +44     
  Misses          28       28           
  Partials        25       25           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shaneahmed shaneahmed changed the title 🐛 Try OpenSlideWSIReader for tiff Files First ⚡ Improve Metadata Handling for WSI Readers Mar 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes metadata inference (objective power and MPP) across WSI readers by introducing a shared inference helper and updating multiple reader implementations and tests. It also adjusts reader-selection priority so TIFF inputs are attempted via OpenSlide first.

Changes:

  • Add a centralized WSIReader._estimate_mpp_objective_power() and use it across multiple readers.
  • Update reader selection to try OpenSlide first for .tif/.tiff.
  • Update remote samples and expand tests for DICOM/TIFF metadata inference and warning behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tiatoolbox/wsicore/wsireader.py Adds OpenSlide-first selection for TIFF and introduces centralized MPP/objective-power inference used across readers.
tiatoolbox/utils/misc.py Broadens typing for objective_power2mpp to accept np.ndarray.
tiatoolbox/data/remote_samples.yaml Updates DICOM sample filename and adds a second DICOM sample entry for new metadata tests.
tests/test_wsireader.py Adds/updates DICOM metadata assertions and adds a new test for objective-power presence/inference.
tests/test_tiffreader.py Updates OME-TIFF metadata tests and adds a warning-logging test for missing metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tiatoolbox/wsicore/wsireader.py Outdated
Comment thread tiatoolbox/wsicore/wsireader.py Outdated
Comment thread tiatoolbox/wsicore/wsireader.py Outdated
Comment thread tiatoolbox/wsicore/wsireader.py Outdated
Comment thread tiatoolbox/data/remote_samples.yaml Outdated
Comment thread tests/test_tiffreader.py Outdated
Comment thread tiatoolbox/wsicore/wsireader.py
Copy link
Copy Markdown
Collaborator

@measty measty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a regression on that ome.tiff you sent me a while ago to check multi-channel reader on (20240625_144804_1_08TcnB_Kidney_panel_June_RP_52top51bottom.ome.tiff).

In develop, pyramid is seen:

Image

Whereas opening the same slide in this PR:

Image

No pyramid is seen so it is slow and uses loads of memory, and the slide doesn't display right (seems to be black & white?)

Comment thread tiatoolbox/wsicore/wsireader.py Outdated
@shaneahmed
Copy link
Copy Markdown
Member Author

This introduces a regression on that ome.tiff you sent me a while ago to check multi-channel reader on (20240625_144804_1_08TcnB_Kidney_panel_June_RP_52top51bottom.ome.tiff).

In develop, pyramid is seen:

Image Whereas opening the same slide in this PR: Image No pyramid is seen so it is slow and uses loads of memory, and the slide doesn't display right (seems to be black & white?)

Is this because of openslide reader? Probably, we can now remove openslidereader first as now meta data is handled better.

@shaneahmed
Copy link
Copy Markdown
Member Author

This introduces a regression on that ome.tiff you sent me a while ago to check multi-channel reader on (20240625_144804_1_08TcnB_Kidney_panel_June_RP_52top51bottom.ome.tiff).

In develop, pyramid is seen:

Image Whereas opening the same slide in this PR: Image No pyramid is seen so it is slow and uses loads of memory, and the slide doesn't display right (seems to be black & white?)

This introduces a regression on that ome.tiff you sent me a while ago to check multi-channel reader on (20240625_144804_1_08TcnB_Kidney_panel_June_RP_52top51bottom.ome.tiff).

In develop, pyramid is seen:

Image Whereas opening the same slide in this PR: Image No pyramid is seen so it is slow and uses loads of memory, and the slide doesn't display right (seems to be black & white?)

@measty This commit 4c0ba9a resolves this issue. I have tested the WSI Registration notebook and the mIF images, both work fine now. However, it fails on MONKEY challenge image.

@shaneahmed shaneahmed merged commit 22e26d8 into develop Mar 10, 2026
19 checks passed
@shaneahmed shaneahmed deleted the bug-fix-openslide-read branch March 10, 2026 11:27
@shaneahmed shaneahmed mentioned this pull request Mar 11, 2026
shaneahmed added a commit that referenced this pull request Mar 11, 2026
## TIAToolbox v2.0.0 (2026-03-11)

### ✨ Major Updates and Feature Improvements

#### ⚙️ Engine Redesign (PR #578)
TIAToolbox 2.0.0 introduces a completely re-engineered inference engine designed for significant performance, scalability, and memory-efficiency improvements.

#### Key Enhancements
- A modern processing stack built on **Dask** (parallel/distributed execution) and **Zarr** (chunked, out-of-core storage)
- **Standardised output formats** across all engines:
  - Python `dict`
  - **Zarr**
  - **AnnotationStore** (SQLite-backed)
  - **QuPath JSON**
- Cleaner runtime behavior with reduced warning noise and a unified progress bar
- More predictable memory usage through chunked streaming
- Broader test coverage across engine components

### 🗺️ Improved QuPath Support
Enhancements include:

- Better handling of **GeoJSON**
- Support for **multipoint geometries** (#841)
- Improved semantic output helpers:
  - `dict_to_store_semantic_segmentor` (#926)
  - OME-TIFF probability overlays (#929)

### 🔬 New Nucleus Detection Engine
A dedicated nucleus detection pipeline has been added, built on the redesigned engine for improved accuracy and efficient large-scale processing.

#### 🧠 KongNet Model Family
TIAToolbox 2.0.0 introduces **KongNet**, a high-performance architecture that achieved top results across multiple international challenges:

- 🥇 **1st place: MONKEY Challenge (overall detection)**
- 🥇 **1st place: MIDOG (mitosis detection)**
- ⭐ Top-tier performance on **PUMA**

Multiple pretrained variants are available (CoNIC, PanNuke, MONKEY, PUMA, MIDOG), each with standardised IO configurations.

### 🧬 Expanded Foundation Model Support
Additional foundation models are now supported (#906), broadening the range of high-capacity architectures available for feature extraction and downstream tasks.

### 🖼️ SAM Segmentation in TIAViz
TIAViz now integrates Meta’s Segment Anything Model (SAM), enabling:

- Interactive segmentation
- Rapid region extraction
- Exploratory annotation workflows

Simplified SAM usage (#968) streamlines its integration into analysis pipelines.

### 🧩 Enhanced WSIReader & Metadata Handling
Major improvements include:

- More robust cross-vendor **metadata extraction** (#1001)
- **Multichannel image support** (PR #825) for immunofluorescence and non-RGB modalities
- Simplified Windows installation using `openslide-bin` (no manual DLL steps)
- macOS Tileserver fix (#976)
- Improved DICOM reading (#934)

### ☁️ New Cloud-Native Reader: FsspecJSONWSIReader (PR #897)
A new reader supporting **fsspec-compatible filesystems**, enabling seamless access to WSIs stored on:

- S3
- GCS
- Azure
- HPC clusters
- Any fsspec-supported backend

This enables cloud-native and distributed data workflows.
Contributed by @aacic

### 🤗 Pretrained Models Migrated to Hugging Face
All pretrained models and sample assets have been migrated (#945, #983), improving:

- Download reliability
- Versioning and reproducibility
- Caching and CI integration
- Licensing clarity per model family

### 🛡️ Security, Compatibility & Tooling

#### 🔐 Security & Dependency Updates
- Dependency upgrades
- Internal security improvements
- Explicit workflow permissions added (#1021, #1023)

#### 🐍 Python Version Support
- **Dropped:** Python **3.9**
- **Added:** Python **3.13**
- **Supported:** Python 3.10–3.13
- Updated CUDA wheel source to **cu126**

#### 🛠️ Developer Tooling & CI/CD
- Expanded **mypy** type-checking coverage (#912, #931, #935, #951)
- Updated pre-commit hooks and general formatting
- CI uses **CPU-only PyTorch** for faster, more reliable builds (#974, #979)
- Updated pip install workflow (#1013)
- Added new **Python 3.13 Docker images** (#1014, #1019)

### 🧹 Bug Fixes & Stability Improvements
- Fixed multi-GPU behaviour with `torch.compile` (#923)
- Fixed DICOM reading issue (#934)
- Fixed annotation contour handling with holes (#956)
- Fixed consecutive annotation load bug (#927)
- Fixed SCCNN model issues (#970)
- Fixed MapDe `dist_filter` shape issue (#914)
- Improved notebook reliability on Colab (#1026#1030)
- macOS TileServer issues resolved (#976)

### 🧭 Migration Guide for Users

#### 🔄 Updating from 1.x to 2.0.0

#### Update calls: replace `.predict()` with `.run()`
```python
# Old
results = segmentor.predict(imgs=[...], ioconfig=config)

# New
results = segmentor.run(images=[...], ioconfig=config)
```

#### Use `patch_mode`: replace `mode="patch"` with `patch_mode=True` and `mode="tile"` or "wsi" with `patch_mode=False`
```python
# Old
results = segmentor.predict(imgs=[...], mode="patch", ioconfig=config)

# New
results = segmentor.run(images=[...], patch_mode=True, ioconfig=config)
```

```python
# Old
results = segmentor.predict(imgs=[...], mode="wsi", ioconfig=config)

# New
results = segmentor.run(images=[...], patch_mode=False, ioconfig=config)

```

#### Use the new I/O configs
```python
from tiatoolbox.models.engine.io_config import IOSegmentorConfig

config = IOSegmentorConfig(
    patch_input_shape=(256, 256),
    stride_shape=(240, 240),
    input_resolutions=[{"resolution": 0.25, "units": "mpp"}],
    save_resolution={"units": "baseline", "resolution": 1.0}
)
```

#### Specify the output format
```python
results = segmentor.run(
    images=[...],
    ioconfig=ioconfig,
    output_type="zarr",  # or "dict", "annotationstore", "qupath"
    save_dir="outputs/"
)
```

#### Update imports
- `tiatoolbox.typing` → `tiatoolbox.type_hints`

#### Install requirements
- Python **3.10+** required
- On Windows: install OpenSlide via `pip install openslide-bin`

**Full Changelog:** v1.6.0...v2.0.0

---------

Signed-off-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
Co-authored-by: measty <20169086+measty@users.noreply.github.com>
Co-authored-by: Jiaqi-Lv <60471431+Jiaqi-Lv@users.noreply.github.com>
Co-authored-by: adamshephard <39619155+adamshephard@users.noreply.github.com>
Co-authored-by: Mostafa Jahanifar <74412979+mostafajahanifar@users.noreply.github.com>
Co-authored-by: John Pocock <John-P@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yijie Zhu <120978607+YijieZhu15@users.noreply.github.com>
Co-authored-by: Aleksandar Acic <32873451+aacic@users.noreply.github.com>
Co-authored-by: Abdol A <u2271662@live.warwick.ac.uk>
Co-authored-by: Abishek <abishekraj6797@gmail.com>
Co-authored-by: behnazelhaminia <30952176+behnazelhaminia@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Adam Shephard <adam.shephard@warwick.ac.uk>
Co-authored-by: gozdeg <gozdegunesli@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: mbasheer04 <78800844+mbasheer04@users.noreply.github.com>
Co-authored-by: vqdang <24943262+vqdang@users.noreply.github.com>
@shaneahmed shaneahmed mentioned this pull request Mar 11, 2026
shaneahmed added a commit that referenced this pull request Mar 12, 2026
🔖 Release 2.0.0 (#1031)
## TIAToolbox v2.0.0 (2026-03-11)

### ✨ Major Updates and Feature Improvements

#### ⚙️ Engine Redesign (PR #578)
TIAToolbox 2.0.0 introduces a completely re-engineered inference engine designed for significant performance, scalability, and memory-efficiency improvements.

#### Key Enhancements
- A modern processing stack built on **Dask** (parallel/distributed execution) and **Zarr** (chunked, out-of-core storage)
- **Standardised output formats** across all engines:
  - Python `dict`
  - **Zarr**
  - **AnnotationStore** (SQLite-backed)
  - **QuPath JSON**
- Cleaner runtime behavior with reduced warning noise and a unified progress bar
- More predictable memory usage through chunked streaming
- Broader test coverage across engine components

### 🗺️ Improved QuPath Support
Enhancements include:

- Better handling of **GeoJSON**
- Support for **multipoint geometries** (#841)
- Improved semantic output helpers:
  - `dict_to_store_semantic_segmentor` (#926)
  - OME-TIFF probability overlays (#929)

### 🔬 New Nucleus Detection Engine
A dedicated nucleus detection pipeline has been added, built on the redesigned engine for improved accuracy and efficient large-scale processing.

#### 🧠 KongNet Model Family
TIAToolbox 2.0.0 introduces **KongNet**, a high-performance architecture that achieved top results across multiple international challenges:

- 🥇 **1st place: MONKEY Challenge (overall detection)**
- 🥇 **1st place: MIDOG (mitosis detection)**
- ⭐ Top-tier performance on **PUMA**

Multiple pretrained variants are available (CoNIC, PanNuke, MONKEY, PUMA, MIDOG), each with standardised IO configurations.

### 🧬 Expanded Foundation Model Support
Additional foundation models are now supported (#906), broadening the range of high-capacity architectures available for feature extraction and downstream tasks.

### 🖼️ SAM Segmentation in TIAViz
TIAViz now integrates Meta’s Segment Anything Model (SAM), enabling:

- Interactive segmentation
- Rapid region extraction
- Exploratory annotation workflows

Simplified SAM usage (#968) streamlines its integration into analysis pipelines.

### 🧩 Enhanced WSIReader & Metadata Handling
Major improvements include:

- More robust cross-vendor **metadata extraction** (#1001)
- **Multichannel image support** (PR #825) for immunofluorescence and non-RGB modalities
- Simplified Windows installation using `openslide-bin` (no manual DLL steps)
- macOS Tileserver fix (#976)
- Improved DICOM reading (#934)

### ☁️ New Cloud-Native Reader: FsspecJSONWSIReader (PR #897)
A new reader supporting **fsspec-compatible filesystems**, enabling seamless access to WSIs stored on:

- S3
- GCS
- Azure
- HPC clusters
- Any fsspec-supported backend

This enables cloud-native and distributed data workflows.
Contributed by @aacic

### 🤗 Pretrained Models Migrated to Hugging Face
All pretrained models and sample assets have been migrated (#945, #983), improving:

- Download reliability
- Versioning and reproducibility
- Caching and CI integration
- Licensing clarity per model family

### 🛡️ Security, Compatibility & Tooling

#### 🔐 Security & Dependency Updates
- Dependency upgrades
- Internal security improvements
- Explicit workflow permissions added (#1021, #1023)

#### 🐍 Python Version Support
- **Dropped:** Python **3.9**
- **Added:** Python **3.13**
- **Supported:** Python 3.10–3.13
- Updated CUDA wheel source to **cu126**

#### 🛠️ Developer Tooling & CI/CD
- Expanded **mypy** type-checking coverage (#912, #931, #935, #951)
- Updated pre-commit hooks and general formatting
- CI uses **CPU-only PyTorch** for faster, more reliable builds (#974, #979)
- Updated pip install workflow (#1013)
- Added new **Python 3.13 Docker images** (#1014, #1019)

### 🧹 Bug Fixes & Stability Improvements
- Fixed multi-GPU behaviour with `torch.compile` (#923)
- Fixed DICOM reading issue (#934)
- Fixed annotation contour handling with holes (#956)
- Fixed consecutive annotation load bug (#927)
- Fixed SCCNN model issues (#970)
- Fixed MapDe `dist_filter` shape issue (#914)
- Improved notebook reliability on Colab (#1026#1030)
- macOS TileServer issues resolved (#976)

### 🧭 Migration Guide for Users

#### 🔄 Updating from 1.x to 2.0.0

#### Update calls: replace `.predict()` with `.run()`
```python
# Old
results = segmentor.predict(imgs=[...], ioconfig=config)

# New
results = segmentor.run(images=[...], ioconfig=config)
```

#### Use `patch_mode`: replace `mode="patch"` with `patch_mode=True` and `mode="tile"` or "wsi" with `patch_mode=False`
```python
# Old
results = segmentor.predict(imgs=[...], mode="patch", ioconfig=config)

# New
results = segmentor.run(images=[...], patch_mode=True, ioconfig=config)
```

```python
# Old
results = segmentor.predict(imgs=[...], mode="wsi", ioconfig=config)

# New
results = segmentor.run(images=[...], patch_mode=False, ioconfig=config)

```

#### Use the new I/O configs
```python
from tiatoolbox.models.engine.io_config import IOSegmentorConfig

config = IOSegmentorConfig(
    patch_input_shape=(256, 256),
    stride_shape=(240, 240),
    input_resolutions=[{"resolution": 0.25, "units": "mpp"}],
    save_resolution={"units": "baseline", "resolution": 1.0}
)
```

#### Specify the output format
```python
results = segmentor.run(
    images=[...],
    ioconfig=ioconfig,
    output_type="zarr",  # or "dict", "annotationstore", "qupath"
    save_dir="outputs/"
)
```

#### Update imports
- `tiatoolbox.typing` → `tiatoolbox.type_hints`

#### Install requirements
- Python **3.10+** required
- On Windows: install OpenSlide via `pip install openslide-bin`

**Full Changelog:** v1.6.0...v2.0.0

---------

Signed-off-by: Shan E Ahmed Raza <13048456+shaneahmed@users.noreply.github.com>
Co-authored-by: measty <20169086+measty@users.noreply.github.com>
Co-authored-by: Jiaqi-Lv <60471431+Jiaqi-Lv@users.noreply.github.com>
Co-authored-by: adamshephard <39619155+adamshephard@users.noreply.github.com>
Co-authored-by: Mostafa Jahanifar <74412979+mostafajahanifar@users.noreply.github.com>
Co-authored-by: John Pocock <John-P@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yijie Zhu <120978607+YijieZhu15@users.noreply.github.com>
Co-authored-by: Aleksandar Acic <32873451+aacic@users.noreply.github.com>
Co-authored-by: Abdol A <u2271662@live.warwick.ac.uk>
Co-authored-by: Abishek <abishekraj6797@gmail.com>
Co-authored-by: behnazelhaminia <30952176+behnazelhaminia@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Adam Shephard <adam.shephard@warwick.ac.uk>
Co-authored-by: gozdeg <gozdegunesli@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: mbasheer04 <78800844+mbasheer04@users.noreply.github.com>
Co-authored-by: vqdang <24943262+vqdang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working dev tools Changes/Updates in Development tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants