Skip to content

Adding sankey as plotting option#4

Open
harryhaller001 wants to merge 12 commits intomainfrom
sankey
Open

Adding sankey as plotting option#4
harryhaller001 wants to merge 12 commits intomainfrom
sankey

Conversation

@harryhaller001
Copy link
Collaborator

  • Use matplotlib-sankey to add sankey plotting option to clustree function
  • Add example to docs
  • Python 3.11 as minimal supported version, move to uv
  • Fix older pre-commit hooks
  • Version updates to github actions

Copy link
Contributor

@maltekuehl maltekuehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this! Some comments raised, but the diff on the main file is quite large, could you perhaps summarize the changes and the motivation?

pyproject.toml Outdated
dependencies = [
"anndata",
"matplotlib",
"matplotlib-sankey>=0.3.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How flexible is matplotlib-sankey with major versions of the standard scientific Python packages? Just to see whether it should be an optional or a required dependency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added matplotlib sankey as extra dependency

[tool.flit.sdist]
exclude = [ "docs/*", "test/*" ]
[tool.flit]
sdist.exclude = [ "docs/*", "test/*" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perspectively, would be great to eliminate flit dependency and just move to uv as in other repos. Unrelated to this PR, just if you have it set up and would like to copy & past this from some other place. But can merge without it, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I would start a separate PR to migrate to uv build backend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python-igraph is deprecated, should now use igraph

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

igraph is a dev dependency. I resolve the deprecation by adding "scanpy[leiden]".

show_fraction: bool = False,
show_cluster_keys: bool = True,
graph_plot_kwargs: dict | None = None,
transition_plot: Literal["network", "sankey"] = "network",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we condense here to sankey_kwargs and point to its docs? Quite a number of keywords for optional behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Thats a good idea. I added sankey parameters as kwargs

@maltekuehl
Copy link
Contributor

Something seems to be broken with the order of layers in the example.ipynb:

grafik

Copy link
Contributor

@maltekuehl maltekuehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above and question about the TODO left in the code.

Comment on lines +155 to +157
# TODO: fix in matplotlib sankey
# if isinstance(node_colormap, str):
# node_colormap = plt.get_cmap(node_colormap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harryhaller001 has this been resolved?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Sankey-based transition visualization option to pyclustree.clustree (via matplotlib-sankey), updates project metadata for Python 3.11+, and refreshes tooling/CI configuration.

Changes:

  • Add transition_plot="sankey" mode to clustree, with sankey_kwargs customization and new sankey-focused tests.
  • Update packaging metadata (version bump, Python requirement, new optional sankey extra) and refresh pre-commit hook versions.
  • Update GitHub Actions to test/build on newer Python versions and adjust editor settings.

Reviewed changes

Copilot reviewed 8 out of 11 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
pyclustree/_clustree.py Implements the new Sankey rendering path and related API surface (transition_plot, sankey_kwargs).
tests/test_pyclustree.py Adds integration tests for the new Sankey plotting option and parameters.
pyproject.toml Bumps version, raises minimum Python, adds sankey extra, and updates tool configuration layout.
.python-version Updates the pinned Python version for local tooling.
.pre-commit-config.yaml Updates hook revisions.
.github/workflows/test.yaml Updates the Python test matrix versions.
.github/workflows/build.yaml Updates the Python version used for build checks.
.vscode/settings.json Expands file watcher exclusions.
pyclustree/py.typed Included for typing distribution (no diff shown).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

node_expr[key][cluster_key_name] = expr_mean
gene_values[key + cluster_key_name] = expr_mean
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gene_values[key + cluster_key_name] will throw TypeError if cluster_key_name is not a string (e.g., integer cluster labels). Use a safe string key (e.g., f"{key}_{cluster_key_name}") when populating gene_values.

Suggested change
gene_values[key + cluster_key_name] = expr_mean
gene_values[f"{key}_{cluster_key_name}"] = expr_mean

Copilot uses AI. Check for mistakes.
"**/.venv/**": true,
"**/__pycache__/**": true,
"**/.pytest_cache/**": true,
"**/.ruffle_cache/**": true,
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files.watcherExclude contains "**/.ruffle_cache/**", but Ruff’s cache directory is .ruff_cache. This typo means the cache won’t be excluded from the file watcher.

Suggested change
"**/.ruffle_cache/**": true,
"**/.ruff_cache/**": true,

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +112
def test_sankey_basic():
"""Test basic sankey plot functionality."""
adata = sc.datasets.pbmc3k_processed()

# Run leiden clustering for different resolutions
for resolution in [0.2, 0.4, 0.6, 0.8, 1.0]:
sc.tl.leiden(
adata,
resolution=resolution,
flavor="igraph",
n_iterations=2,
key_added=f"leiden_{str(resolution).replace('.', '_')}",
)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new sankey tests repeatedly call sc.datasets.pbmc3k_processed() and rerun Leiden clustering in each test. This is expensive and can make CI slow/flaky (dataset download, repeated clustering). Consider introducing a session-scoped fixture that loads the dataset once and precomputes the needed leiden_* columns, then reuse it across sankey tests.

Copilot uses AI. Check for mistakes.
adata,
cluster_keys,
transition_plot="sankey",
sankey_kwargs={"show_colorbar": True},
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes show_colorbar inside sankey_kwargs, but clustree’s colorbar is controlled by the top-level show_colorbar parameter (and show_colorbar isn’t consumed from sankey_kwargs). As written, the test doesn’t exercise colorbar behavior; pass show_colorbar=True to clustree instead (and keep sankey_kwargs for sankey-specific options).

Suggested change
sankey_kwargs={"show_colorbar": True},
show_colorbar=True,
sankey_kwargs={},

Copilot uses AI. Check for mistakes.
Comment on lines 42 to +76
@@ -54,9 +55,8 @@ optional-dependencies.dev = [
"pandoc",
"pre-commit",
"pytest",
"python-igraph",
"scanpy",
# Until pybtex >0.23.0 releases: https://bitbucket.org/pybtex-devs/pybtex/issues/169/
"scanpy[leiden]>=1.11.5",
"setuptools",
"sphinx>=4",
"sphinx-autoapi",
@@ -71,22 +71,22 @@ optional-dependencies.dev = [
"twine>=4.0.2",
"types-networkx",
]
optional-dependencies.sankey = [
"matplotlib-sankey>=0.3.2",
]
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite now depends on matplotlib-sankey (sankey tests), but CI installs .[dev,sklearn]. To avoid ImportError in tests, either add matplotlib-sankey to optional-dependencies.dev or ensure CI installs the sankey extra as well.

Copilot uses AI. Check for mistakes.
Comment on lines +470 to +482
if node_color_gene is not None:
node_expr: dict[str, dict[str, float]] = {}

for key in cluster_keys:
node_expr[key] = {}
for cluster_key_name in adata.obs[key].unique():
expr_mean = np.mean(
np.array(
adata[(adata.obs[key] == cluster_key_name), :].raw[:, [node_color_gene]].X.todense()
).ravel(),
axis=0,
)

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sankey gene coloring currently assumes adata.raw exists and always reads expression from .raw, ignoring node_color_gene_use_raw and the node_color_gene in adata.obs.columns case handled in the network path. This can raise at runtime (e.g., adata.raw is None) or compute the wrong values; consider reusing the same gene-value extraction/aggregation logic as the network branch and avoid per-cluster adata[...] slicing.

Copilot uses AI. Check for mistakes.
[tool.flit.sdist]
exclude = [ "docs/*", "test/*" ]
[tool.flit]
sdist.exclude = [ "docs/*", "test/*" ]
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sdist.exclude uses "test/*", but the repository’s directory is tests/ (plural). As written, the sdist will still include the test suite; consider changing the pattern to "tests/*" (or removing tests intentionally if that’s the goal).

Suggested change
sdist.exclude = [ "docs/*", "test/*" ]
sdist.exclude = [ "docs/*", "tests/*" ]

Copilot uses AI. Check for mistakes.
"except",
"warning",
[tool.coverage]
run.omit = [ "*/test/*" ]
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage omit pattern is "*/test/*", but tests live under tests/ (plural), so test files won’t be omitted. Update the pattern (e.g., "*/tests/*") so coverage reporting isn’t skewed by test code.

Suggested change
run.omit = [ "*/test/*" ]
run.omit = [ "*/tests/*" ]

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +28
class SankeyKwargs(TypedDict):
spacing: float
curve_type: Literal["curve3", "curve4", "line"]
ribbon_alpha: float
ribbon_color: str
show_legend: bool
rel_column_width: float
annotate_columns: Literal["index", "weight", "weight_percent"] | None
annotate_columns_font_kwargs: dict | None

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SankeyKwargs is used as a bag of optional overrides (via .get(...) defaults), but the TypedDict is currently total (all keys required). Mark it as TypedDict(total=False) (or use NotRequired[...]) so type checkers match runtime usage, and so sankey_kwargs = {} is valid.

Copilot uses AI. Check for mistakes.
maltekuehl and others added 2 commits March 11, 2026 20:30
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@maltekuehl
Copy link
Contributor

@harryhaller001 we also had some changes to clustree in the ck dev repo irrc, not sure whether we merged all of them back. Some things might have been addressed already? At a conference this week, so cannot check, but want to retake the ck effort soon. We should definitely try to also get maplotlib sankey in there and try to align these efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants