Adding sankey as plotting option by harryhaller001 · Pull Request #4 · complextissue/pyclustree

harryhaller001 · 2026-02-22T21:13:22Z

Use matplotlib-sankey to add sankey plotting option to clustree function
Add example to docs
Python 3.11 as minimal supported version, move to uv
Fix older pre-commit hooks
Version updates to github actions

maltekuehl

Thanks for contributing this! Some comments raised, but the diff on the main file is quite large, could you perhaps summarize the changes and the motivation?

maltekuehl · 2026-02-23T20:09:11Z

pyproject.toml

 dependencies = [
  "anndata",
  "matplotlib",
+  "matplotlib-sankey>=0.3.2",


How flexible is matplotlib-sankey with major versions of the standard scientific Python packages? Just to see whether it should be an optional or a required dependency

I added matplotlib sankey as extra dependency

maltekuehl · 2026-02-23T20:10:03Z

pyproject.toml

-[tool.flit.sdist]
-exclude = [ "docs/*", "test/*" ]
+[tool.flit]
+sdist.exclude = [ "docs/*", "test/*" ]


Perspectively, would be great to eliminate flit dependency and just move to uv as in other repos. Unrelated to this PR, just if you have it set up and would like to copy & past this from some other place. But can merge without it, too.

Good idea. I would start a separate PR to migrate to uv build backend

maltekuehl · 2026-02-23T20:11:25Z

docs/source/example.ipynb

python-igraph is deprecated, should now use igraph

igraph is a dev dependency. I resolve the deprecation by adding "scanpy[leiden]".

maltekuehl · 2026-02-23T20:12:32Z

pyclustree/_clustree.py

    show_fraction: bool = False,
    show_cluster_keys: bool = True,
    graph_plot_kwargs: dict | None = None,
+    transition_plot: Literal["network", "sankey"] = "network",


Could we condense here to sankey_kwargs and point to its docs? Quite a number of keywords for optional behavior

Sure. Thats a good idea. I added sankey parameters as kwargs

… vignette

maltekuehl · 2026-03-11T10:07:25Z

Something seems to be broken with the order of layers in the example.ipynb:

maltekuehl

See comment above and question about the TODO left in the code.

maltekuehl · 2026-03-11T10:04:22Z

pyclustree/_clustree.py

+    # TODO: fix in matplotlib sankey
+    # if isinstance(node_colormap, str):
+    #     node_colormap = plt.get_cmap(node_colormap)


@harryhaller001 has this been resolved?

Copilot

Pull request overview

This PR adds a new Sankey-based transition visualization option to pyclustree.clustree (via matplotlib-sankey), updates project metadata for Python 3.11+, and refreshes tooling/CI configuration.

Changes:

Add transition_plot="sankey" mode to clustree, with sankey_kwargs customization and new sankey-focused tests.
Update packaging metadata (version bump, Python requirement, new optional sankey extra) and refresh pre-commit hook versions.
Update GitHub Actions to test/build on newer Python versions and adjust editor settings.

Reviewed changes

Copilot reviewed 8 out of 11 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`pyclustree/_clustree.py`	Implements the new Sankey rendering path and related API surface (`transition_plot`, `sankey_kwargs`).
`tests/test_pyclustree.py`	Adds integration tests for the new Sankey plotting option and parameters.
`pyproject.toml`	Bumps version, raises minimum Python, adds `sankey` extra, and updates tool configuration layout.
`.python-version`	Updates the pinned Python version for local tooling.
`.pre-commit-config.yaml`	Updates hook revisions.
`.github/workflows/test.yaml`	Updates the Python test matrix versions.
`.github/workflows/build.yaml`	Updates the Python version used for build checks.
`.vscode/settings.json`	Expands file watcher exclusions.
`pyclustree/py.typed`	Included for typing distribution (no diff shown).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T10:14:37Z

pyclustree/_clustree.py

+                    )
+
+                    node_expr[key][cluster_key_name] = expr_mean
+                    gene_values[key + cluster_key_name] = expr_mean


gene_values[key + cluster_key_name] will throw TypeError if cluster_key_name is not a string (e.g., integer cluster labels). Use a safe string key (e.g., f"{key}_{cluster_key_name}") when populating gene_values.

Suggested change

gene_values[key + cluster_key_name] = expr_mean

gene_values[f"{key}_{cluster_key_name}"] = expr_mean

Copilot · 2026-03-11T10:14:38Z

.vscode/settings.json

+        "**/.venv/**": true,
+        "**/__pycache__/**": true,
+        "**/.pytest_cache/**": true,
+        "**/.ruffle_cache/**": true,


files.watcherExclude contains "**/.ruffle_cache/**", but Ruff’s cache directory is .ruff_cache. This typo means the cache won’t be excluded from the file watcher.

Suggested change

"**/.ruffle_cache/**": true,

"**/.ruff_cache/**": true,

Copilot · 2026-03-11T10:14:38Z

tests/test_pyclustree.py

+def test_sankey_basic():
+    """Test basic sankey plot functionality."""
+    adata = sc.datasets.pbmc3k_processed()
+
+    # Run leiden clustering for different resolutions
+    for resolution in [0.2, 0.4, 0.6, 0.8, 1.0]:
+        sc.tl.leiden(
+            adata,
+            resolution=resolution,
+            flavor="igraph",
+            n_iterations=2,
+            key_added=f"leiden_{str(resolution).replace('.', '_')}",
+        )


The new sankey tests repeatedly call sc.datasets.pbmc3k_processed() and rerun Leiden clustering in each test. This is expensive and can make CI slow/flaky (dataset download, repeated clustering). Consider introducing a session-scoped fixture that loads the dataset once and precomputes the needed leiden_* columns, then reuse it across sankey tests.

Copilot · 2026-03-11T10:14:38Z

tests/test_pyclustree.py

+        adata,
+        cluster_keys,
+        transition_plot="sankey",
+        sankey_kwargs={"show_colorbar": True},


This test passes show_colorbar inside sankey_kwargs, but clustree’s colorbar is controlled by the top-level show_colorbar parameter (and show_colorbar isn’t consumed from sankey_kwargs). As written, the test doesn’t exercise colorbar behavior; pass show_colorbar=True to clustree instead (and keep sankey_kwargs for sankey-specific options).

Suggested change

sankey_kwargs={"show_colorbar": True},

show_colorbar=True,

sankey_kwargs={},

Copilot · 2026-03-11T10:14:39Z

pyproject.toml

@@ -54,9 +55,8 @@ optional-dependencies.dev = [
  "pandoc",
  "pre-commit",
  "pytest",
-  "python-igraph",
-  "scanpy",
  # Until pybtex >0.23.0 releases: https://bitbucket.org/pybtex-devs/pybtex/issues/169/
+  "scanpy[leiden]>=1.11.5",
  "setuptools",
  "sphinx>=4",
  "sphinx-autoapi",
@@ -71,22 +71,22 @@ optional-dependencies.dev = [
  "twine>=4.0.2",
  "types-networkx",
 ]
+optional-dependencies.sankey = [
+  "matplotlib-sankey>=0.3.2",
+]


The test suite now depends on matplotlib-sankey (sankey tests), but CI installs .[dev,sklearn]. To avoid ImportError in tests, either add matplotlib-sankey to optional-dependencies.dev or ensure CI installs the sankey extra as well.

Copilot · 2026-03-11T10:14:39Z

pyclustree/_clustree.py

+        if node_color_gene is not None:
+            node_expr: dict[str, dict[str, float]] = {}
+
+            for key in cluster_keys:
+                node_expr[key] = {}
+                for cluster_key_name in adata.obs[key].unique():
+                    expr_mean = np.mean(
+                        np.array(
+                            adata[(adata.obs[key] == cluster_key_name), :].raw[:, [node_color_gene]].X.todense()
+                        ).ravel(),
+                        axis=0,
+                    )
+


Sankey gene coloring currently assumes adata.raw exists and always reads expression from .raw, ignoring node_color_gene_use_raw and the node_color_gene in adata.obs.columns case handled in the network path. This can raise at runtime (e.g., adata.raw is None) or compute the wrong values; consider reusing the same gene-value extraction/aggregation logic as the network branch and avoid per-cluster adata[...] slicing.

Copilot · 2026-03-11T10:14:39Z

pyproject.toml

-[tool.flit.sdist]
-exclude = [ "docs/*", "test/*" ]
+[tool.flit]
+sdist.exclude = [ "docs/*", "test/*" ]


sdist.exclude uses "test/*", but the repository’s directory is tests/ (plural). As written, the sdist will still include the test suite; consider changing the pattern to "tests/*" (or removing tests intentionally if that’s the goal).

Suggested change

sdist.exclude = [ "docs/*", "test/*" ]

sdist.exclude = [ "docs/*", "tests/*" ]

Copilot · 2026-03-11T10:14:40Z

pyproject.toml

-  "except",
-  "warning",
+[tool.coverage]
+run.omit = [ "*/test/*" ]


Coverage omit pattern is "*/test/*", but tests live under tests/ (plural), so test files won’t be omitted. Update the pattern (e.g., "*/tests/*") so coverage reporting isn’t skewed by test code.

Suggested change

run.omit = [ "*/test/*" ]

run.omit = [ "*/tests/*" ]

.python-version

Copilot · 2026-03-11T10:14:40Z

pyclustree/_clustree.py

+class SankeyKwargs(TypedDict):
+    spacing: float
+    curve_type: Literal["curve3", "curve4", "line"]
+    ribbon_alpha: float
+    ribbon_color: str
+    show_legend: bool
+    rel_column_width: float
+    annotate_columns: Literal["index", "weight", "weight_percent"] | None
+    annotate_columns_font_kwargs: dict | None
+


SankeyKwargs is used as a bag of optional overrides (via .get(...) defaults), but the TypedDict is currently total (all keys required). Mark it as TypedDict(total=False) (or use NotRequired[...]) so type checkers match runtime usage, and so sankey_kwargs = {} is valid.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

maltekuehl · 2026-03-11T19:34:09Z

@harryhaller001 we also had some changes to clustree in the ck dev repo irrc, not sure whether we merged all of them back. Some things might have been addressed already? At a conference this week, so cannot check, but want to retake the ck effort soon. We should definitely try to also get maplotlib sankey in there and try to align these efforts.

harryhaller001 and others added 8 commits April 28, 2025 14:45

start working on sankey plot for pyclustree

c569ac5

working on sankey pyclustree implentation

da002f4

adding typed marker file, working on gene expression overlay

53e5323

integrating sankey pyclustree in package code, updating docs

a736181

working on sankey plots for pyclustree, increase python version

13bc57f

update testing

3c7d531

update pre commit config, remove testing noteboon, format code

827e5ae

Merge branch 'main' into sankey

a12adf2

harryhaller001 requested a review from maltekuehl February 23, 2026 15:17

maltekuehl requested changes Feb 23, 2026

View reviewed changes

harryhaller001 added 2 commits March 8, 2026 11:28

updating scanpy and igraph deps, adding sankey args as kwargs, update…

cdf2d44

… vignette

adding matplotlib-sankey as optional dep

be0da21

harryhaller001 requested a review from maltekuehl March 10, 2026 16:29

maltekuehl requested a review from Copilot March 11, 2026 10:05

Copilot started reviewing on behalf of maltekuehl March 11, 2026 10:05 View session

maltekuehl reviewed Mar 11, 2026

View reviewed changes

Copilot AI reviewed Mar 11, 2026

View reviewed changes

maltekuehl and others added 2 commits March 11, 2026 20:30

Correct stacklevel

0134ccd

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update .python-version

5792640

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

	gene_values[key + cluster_key_name] = expr_mean
	gene_values[f"{key}_{cluster_key_name}"] = expr_mean

	sankey_kwargs={"show_colorbar": True},
	show_colorbar=True,
	sankey_kwargs={},

	sdist.exclude = [ "docs/", "test/" ]
	sdist.exclude = [ "docs/", "tests/" ]

Conversation

harryhaller001 commented Feb 22, 2026

Uh oh!

maltekuehl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maltekuehl commented Mar 11, 2026

Uh oh!

maltekuehl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

maltekuehl commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants