Conversation
harryhaller001
commented
Feb 22, 2026
- Use matplotlib-sankey to add sankey plotting option to clustree function
- Add example to docs
- Python 3.11 as minimal supported version, move to uv
- Fix older pre-commit hooks
- Version updates to github actions
maltekuehl
left a comment
There was a problem hiding this comment.
Thanks for contributing this! Some comments raised, but the diff on the main file is quite large, could you perhaps summarize the changes and the motivation?
pyproject.toml
Outdated
| dependencies = [ | ||
| "anndata", | ||
| "matplotlib", | ||
| "matplotlib-sankey>=0.3.2", |
There was a problem hiding this comment.
How flexible is matplotlib-sankey with major versions of the standard scientific Python packages? Just to see whether it should be an optional or a required dependency
There was a problem hiding this comment.
I added matplotlib sankey as extra dependency
| [tool.flit.sdist] | ||
| exclude = [ "docs/*", "test/*" ] | ||
| [tool.flit] | ||
| sdist.exclude = [ "docs/*", "test/*" ] |
There was a problem hiding this comment.
Perspectively, would be great to eliminate flit dependency and just move to uv as in other repos. Unrelated to this PR, just if you have it set up and would like to copy & past this from some other place. But can merge without it, too.
There was a problem hiding this comment.
Good idea. I would start a separate PR to migrate to uv build backend
There was a problem hiding this comment.
python-igraph is deprecated, should now use igraph
There was a problem hiding this comment.
igraph is a dev dependency. I resolve the deprecation by adding "scanpy[leiden]".
| show_fraction: bool = False, | ||
| show_cluster_keys: bool = True, | ||
| graph_plot_kwargs: dict | None = None, | ||
| transition_plot: Literal["network", "sankey"] = "network", |
There was a problem hiding this comment.
Could we condense here to sankey_kwargs and point to its docs? Quite a number of keywords for optional behavior
There was a problem hiding this comment.
Sure. Thats a good idea. I added sankey parameters as kwargs
maltekuehl
left a comment
There was a problem hiding this comment.
See comment above and question about the TODO left in the code.
| # TODO: fix in matplotlib sankey | ||
| # if isinstance(node_colormap, str): | ||
| # node_colormap = plt.get_cmap(node_colormap) |
There was a problem hiding this comment.
Pull request overview
This PR adds a new Sankey-based transition visualization option to pyclustree.clustree (via matplotlib-sankey), updates project metadata for Python 3.11+, and refreshes tooling/CI configuration.
Changes:
- Add
transition_plot="sankey"mode toclustree, withsankey_kwargscustomization and new sankey-focused tests. - Update packaging metadata (version bump, Python requirement, new optional
sankeyextra) and refresh pre-commit hook versions. - Update GitHub Actions to test/build on newer Python versions and adjust editor settings.
Reviewed changes
Copilot reviewed 8 out of 11 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
pyclustree/_clustree.py |
Implements the new Sankey rendering path and related API surface (transition_plot, sankey_kwargs). |
tests/test_pyclustree.py |
Adds integration tests for the new Sankey plotting option and parameters. |
pyproject.toml |
Bumps version, raises minimum Python, adds sankey extra, and updates tool configuration layout. |
.python-version |
Updates the pinned Python version for local tooling. |
.pre-commit-config.yaml |
Updates hook revisions. |
.github/workflows/test.yaml |
Updates the Python test matrix versions. |
.github/workflows/build.yaml |
Updates the Python version used for build checks. |
.vscode/settings.json |
Expands file watcher exclusions. |
pyclustree/py.typed |
Included for typing distribution (no diff shown). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| node_expr[key][cluster_key_name] = expr_mean | ||
| gene_values[key + cluster_key_name] = expr_mean |
There was a problem hiding this comment.
gene_values[key + cluster_key_name] will throw TypeError if cluster_key_name is not a string (e.g., integer cluster labels). Use a safe string key (e.g., f"{key}_{cluster_key_name}") when populating gene_values.
| gene_values[key + cluster_key_name] = expr_mean | |
| gene_values[f"{key}_{cluster_key_name}"] = expr_mean |
| "**/.venv/**": true, | ||
| "**/__pycache__/**": true, | ||
| "**/.pytest_cache/**": true, | ||
| "**/.ruffle_cache/**": true, |
There was a problem hiding this comment.
files.watcherExclude contains "**/.ruffle_cache/**", but Ruff’s cache directory is .ruff_cache. This typo means the cache won’t be excluded from the file watcher.
| "**/.ruffle_cache/**": true, | |
| "**/.ruff_cache/**": true, |
| def test_sankey_basic(): | ||
| """Test basic sankey plot functionality.""" | ||
| adata = sc.datasets.pbmc3k_processed() | ||
|
|
||
| # Run leiden clustering for different resolutions | ||
| for resolution in [0.2, 0.4, 0.6, 0.8, 1.0]: | ||
| sc.tl.leiden( | ||
| adata, | ||
| resolution=resolution, | ||
| flavor="igraph", | ||
| n_iterations=2, | ||
| key_added=f"leiden_{str(resolution).replace('.', '_')}", | ||
| ) |
There was a problem hiding this comment.
The new sankey tests repeatedly call sc.datasets.pbmc3k_processed() and rerun Leiden clustering in each test. This is expensive and can make CI slow/flaky (dataset download, repeated clustering). Consider introducing a session-scoped fixture that loads the dataset once and precomputes the needed leiden_* columns, then reuse it across sankey tests.
| adata, | ||
| cluster_keys, | ||
| transition_plot="sankey", | ||
| sankey_kwargs={"show_colorbar": True}, |
There was a problem hiding this comment.
This test passes show_colorbar inside sankey_kwargs, but clustree’s colorbar is controlled by the top-level show_colorbar parameter (and show_colorbar isn’t consumed from sankey_kwargs). As written, the test doesn’t exercise colorbar behavior; pass show_colorbar=True to clustree instead (and keep sankey_kwargs for sankey-specific options).
| sankey_kwargs={"show_colorbar": True}, | |
| show_colorbar=True, | |
| sankey_kwargs={}, |
| @@ -54,9 +55,8 @@ optional-dependencies.dev = [ | |||
| "pandoc", | |||
| "pre-commit", | |||
| "pytest", | |||
| "python-igraph", | |||
| "scanpy", | |||
| # Until pybtex >0.23.0 releases: https://bitbucket.org/pybtex-devs/pybtex/issues/169/ | |||
| "scanpy[leiden]>=1.11.5", | |||
| "setuptools", | |||
| "sphinx>=4", | |||
| "sphinx-autoapi", | |||
| @@ -71,22 +71,22 @@ optional-dependencies.dev = [ | |||
| "twine>=4.0.2", | |||
| "types-networkx", | |||
| ] | |||
| optional-dependencies.sankey = [ | |||
| "matplotlib-sankey>=0.3.2", | |||
| ] | |||
There was a problem hiding this comment.
The test suite now depends on matplotlib-sankey (sankey tests), but CI installs .[dev,sklearn]. To avoid ImportError in tests, either add matplotlib-sankey to optional-dependencies.dev or ensure CI installs the sankey extra as well.
| if node_color_gene is not None: | ||
| node_expr: dict[str, dict[str, float]] = {} | ||
|
|
||
| for key in cluster_keys: | ||
| node_expr[key] = {} | ||
| for cluster_key_name in adata.obs[key].unique(): | ||
| expr_mean = np.mean( | ||
| np.array( | ||
| adata[(adata.obs[key] == cluster_key_name), :].raw[:, [node_color_gene]].X.todense() | ||
| ).ravel(), | ||
| axis=0, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Sankey gene coloring currently assumes adata.raw exists and always reads expression from .raw, ignoring node_color_gene_use_raw and the node_color_gene in adata.obs.columns case handled in the network path. This can raise at runtime (e.g., adata.raw is None) or compute the wrong values; consider reusing the same gene-value extraction/aggregation logic as the network branch and avoid per-cluster adata[...] slicing.
| [tool.flit.sdist] | ||
| exclude = [ "docs/*", "test/*" ] | ||
| [tool.flit] | ||
| sdist.exclude = [ "docs/*", "test/*" ] |
There was a problem hiding this comment.
sdist.exclude uses "test/*", but the repository’s directory is tests/ (plural). As written, the sdist will still include the test suite; consider changing the pattern to "tests/*" (or removing tests intentionally if that’s the goal).
| sdist.exclude = [ "docs/*", "test/*" ] | |
| sdist.exclude = [ "docs/*", "tests/*" ] |
| "except", | ||
| "warning", | ||
| [tool.coverage] | ||
| run.omit = [ "*/test/*" ] |
There was a problem hiding this comment.
Coverage omit pattern is "*/test/*", but tests live under tests/ (plural), so test files won’t be omitted. Update the pattern (e.g., "*/tests/*") so coverage reporting isn’t skewed by test code.
| run.omit = [ "*/test/*" ] | |
| run.omit = [ "*/tests/*" ] |
| class SankeyKwargs(TypedDict): | ||
| spacing: float | ||
| curve_type: Literal["curve3", "curve4", "line"] | ||
| ribbon_alpha: float | ||
| ribbon_color: str | ||
| show_legend: bool | ||
| rel_column_width: float | ||
| annotate_columns: Literal["index", "weight", "weight_percent"] | None | ||
| annotate_columns_font_kwargs: dict | None | ||
|
|
There was a problem hiding this comment.
SankeyKwargs is used as a bag of optional overrides (via .get(...) defaults), but the TypedDict is currently total (all keys required). Mark it as TypedDict(total=False) (or use NotRequired[...]) so type checkers match runtime usage, and so sankey_kwargs = {} is valid.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@harryhaller001 we also had some changes to clustree in the ck dev repo irrc, not sure whether we merged all of them back. Some things might have been addressed already? At a conference this week, so cannot check, but want to retake the ck effort soon. We should definitely try to also get maplotlib sankey in there and try to align these efforts. |
