Skip to content

Fix magnitude scaling module to allow array types for magnitude to area functions#56

Open
lispandfound wants to merge 14 commits intomainfrom
fix_magnitude_scaling_types
Open

Fix magnitude scaling module to allow array types for magnitude to area functions#56
lispandfound wants to merge 14 commits intomainfrom
fix_magnitude_scaling_types

Conversation

@lispandfound
Copy link
Copy Markdown
Contributor

@lispandfound lispandfound commented Feb 3, 2026

Small PR that just adjusts the type signatures of the magnitude scaling functions to allow arrays as well as scalars. Plus minor CI fixes because of python dependencies (hence the really large PR count, it is just the uv.lock file!).

Also:

  • Minor type checking fixes because the type checker hadn't been run on the codebase for a while.
  • Made the magnitude scaling tests faster by using the vectorised interface this PR adds.

I've made a bunch of comments in the PR to explain the rationale behind the changes to make it easier to review (i.e. what you can skip in your review).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @lispandfound, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the flexibility and robustness of the magnitude scaling module by allowing array types for magnitude to area functions. It also updates the development dependencies to include ruff and deptry for better code quality and dependency management.

Highlights

  • Type Signature Adjustment: The type signatures of magnitude scaling functions have been adjusted to accommodate array types, enhancing flexibility.
  • Dependency Management: The pyproject.toml file has been updated to adjust development dependencies, adding ruff and deptry.
  • Type Hinting: Introduced Array and TArray type hints to allow for float, np.NDArray, pd.Series, and pd.DataFrame types.
Changelog
  • pyproject.toml
    • Updated development dependencies to include 'ruff' and 'deptry'.
  • source_modelling/magnitude_scaling.py
    • Adjusted type signatures to allow array types for magnitude to area functions.
    • Introduced Array and TArray type hints for improved type handling.
Activity
  • The author, lispandfound, has made changes to allow array types in magnitude scaling functions.
  • The pull request includes modifications to pyproject.toml and source_modelling/magnitude_scaling.py.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to allow array-like inputs for magnitude scaling functions by updating their type signatures. While this is a good improvement, the implementations of these functions have not been updated to handle array inputs. Most of the modified functions use conditional logic (e.g., if magnitude < 6:) that will raise a ValueError at runtime when passed a numpy array or pandas Series. I've left specific comments on the functions that need to be refactored using vectorized operations like np.where or boolean masking. Additionally, I found a minor error in a docstring. Please address these critical issues to ensure the functions work as intended with array inputs. It would also be beneficial to add tests that use array-like inputs (e.g., numpy arrays) to verify the vectorized implementations and prevent similar issues in the future.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR generalizes the magnitude scaling functions to accept array-like inputs (NumPy arrays and pandas Series) in addition to scalars, and updates tests and CI tooling accordingly.

Changes:

  • Introduced shared array/series typing helpers in magnitude_scaling.py and refactored individual scaling relations to work with scalar and array-like magnitudes/areas, including vectorized random sampling and warnings.
  • Extended the test suite to verify type preservation (float/ndarray/Series) across all relevant scaling functions and adjusted stochastic tests to use batched inputs.
  • Updated project metadata and CI workflows (Python version constraint, deptry configuration, and system dependencies) to support the new setup.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
source_modelling/magnitude_scaling.py Introduces Array / TArray typing, helper utilities for sampling and type coercion, and updates all scaling functions and high-level wrappers to support scalar and array-like inputs and outputs.
tests/test_magnitude_scaling.py Adjusts stochastic tests to call scaling functions with array inputs and adds a new parametrized test to ensure return types match the input container type (float, NumPy array, pandas Series).
pyproject.toml Broadens the supported Python versions to >=3.11, reorganizes dev/type dependencies, and configures deptry’s dev dependency groups.
.github/workflows/pytest.yml Ensures CI installs additional system packages (libgdal-dev, gdal-bin) required by the dependency stack.
.github/workflows/deptry.yml Simplifies the deptry invocation to run against the whole project with configuration now provided via pyproject.toml.
Comments suppressed due to low confidence (1)

source_modelling/magnitude_scaling.py:158

  • The docstring for leonard_area_to_magnitude still documents the return type as float, but the function is now typed as returning TArray and is exercised with NumPy arrays and pd.Series in the tests, so the documentation is misleading. Please update the Returns section to describe the widened return type (and, if relevant, mention that a scalar input yields a scalar output and array-like input yields the corresponding array-like output).
def leonard_area_to_magnitude(
    area: TArray, rake: float, random: bool = False
) -> TArray:
    """Convert area to magnitude using the leonard scaling relationship [0]_.

    Parameters
    ----------
    area : TArray
        Area of the fault (km^2).
    rake : float
        Rake of the fault (degrees).
    random : bool, optional
        If True, sample parameters according to uncertainties in the
        paper, otherwise use the *best-fit* values. Default is False.

    Returns
    -------
    float
        Moment magnitude of the fault.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Made this change, but then decided it should live in a separate PR
@lispandfound
Copy link
Copy Markdown
Contributor Author

It has been a while since the type checking has been done for this repo, so there was a few change I had to make to get it to comply. Mostly inconsistencies and bugs that the type checker found (e.g. incorrect return type for the geojson property: it said it returns a dict but it really returns a str). It also had me remove some ignore statements that are no longer needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gdal is required for fiona now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved this config to pyproject.toml.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv sync is not necessary, uv run does that by default.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type ignore statement used to be there when the type stubs for DisjointSet didn't work. I made a PR to scipy that fixed it a few weeks ago.

Comment on lines +75 to +86
@overload
def sampled_spanning_tree(
graph: nx.Graph, n_samples: int = 1
) -> nx.Graph: ... # numpydoc ignore=GL08


@overload
def sampled_spanning_tree(
graph: nx.Graph, n_samples: int = ...
) -> list[nx.Graph]: ... # numpydoc ignore=GL08


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual overload statements so that Python knows how to infer whether it is a single graph or many sampled graphs.

spanning_trees = []

for spanning_tree in mst.SpanningTreeIterator(weighted_graph, minimum=False):
for spanning_tree in mst.SpanningTreeIterator(weighted_graph, minimum=False): # type: ignore[invalid-argument-type]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a bug in networkx type stubs. It should be accept either a graph or an undirected graph because the code works for both, but it currently only accepts a DiGraph. Will make a PR to networkx at some point to fix this.

Comment on lines -201 to 203
return self.geometry.distance(
shapely.Point(coordinates.wgs_depth_to_nztm(point))
return shapely.distance(
self.geometry, shapely.Point(coordinates.wgs_depth_to_nztm(point))
)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type checker correctly points out that a shapely.Geometry object doesn't necessarily have a distance member. To maximise the likelihood this works, I have changed this to be a shapely.distance invocation.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 13 comments.

Comments suppressed due to low confidence (1)

source_modelling/magnitude_scaling.py:935

  • area_to_magnitude accepts rake: float | None and forwards rake directly into leonard_area_to_magnitude for the LEONARD2014 case without validation, so calling area_to_magnitude(ScalingRelation.LEONARD2014, area, rake=None) will fail later with a TypeError in rake_type instead of a clear ValueError like magnitude_to_area raises. It would be better for API consistency and error reporting to mirror magnitude_to_area and explicitly raise if scaling_relation is LEONARD2014 and rake is None.
def area_to_magnitude(
    scaling_relation: ScalingRelation,
    area: TArray,
    rake: float | None = None,
    random: bool = False,
) -> TArray:
    """Convert area to magnitude using a scaling relationship.

    Parameters
    ----------
    scaling_relation : ScalingRelation
        Scaling relation to use.
    area : TArray
        Area of the fault (km^2).
    rake : float, optional
        Rake of the fault (degrees). Required for Leonard scaling.
    random : bool, optional
        If True, sample parameters according to uncertainties in the
        paper, otherwise use the mean values. Default is False.

    Returns
    -------
    TArray
        Moment magnitude of the fault estimated by the scaling relation.
    """
    scaling_relations_map = {
        ScalingRelation.LEONARD2014: functools.partial(
            leonard_area_to_magnitude, rake=rake, random=random
        ),
        ScalingRelation.CONTRERAS_INTERFACE2017: functools.partial(
            contreras_interface_area_to_magnitude, random=random
        ),
        ScalingRelation.CONTRERAS_SLAB2020: functools.partial(
            strasser_slab_area_to_magnitude, random=random
        ),
    }
    return scaling_relations_map[scaling_relation](area)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

else 0
)

return a + sigma_a + (b + sigma_b) * np.log10(area)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of the scaling functions (e.g. contreras_interface_area_to_magnitude, contreras_interface_magnitude_to_area, strasser_slab_area_to_magnitude, strasser_slab_magnitude_to_area) are included in RETURN_TYPE_FUNCS and are expected by test_scaling_output_types to preserve the input scalar type (float), but they return NumPy scalar types (e.g. from np.log, np.log10, np.exp, np.pow) rather than plain Python floats, so isinstance(result, float) will be false. Either these functions should use _coerce_array_types (as you do for the Leonard and some Contreras functions) to coerce results back to the input type, or the tests should be relaxed to accept NumPy scalar subclasses as valid for the scalar case.

Suggested change
return a + sigma_a + (b + sigma_b) * np.log10(area)
magnitude = a + sigma_a + (b + sigma_b) * np.log10(area)
return _coerce_array_types(area, magnitude)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. The tests suggest that floats, arrays, and series return floats, arrays, and series respectively. So I didn't add the _coerce_array_types function here because it doesn't need it.

x: magnitude_scaling.Array,
random: bool = False,
rake: float | None = None,
) -> float: ...
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RandomFunction protocol now takes x: magnitude_scaling.Array but still declares a return type of float, even though the scaling functions under test can now return arrays or pandas Series when given non-scalar inputs. To keep the type hints accurate and avoid confusion in type checking, the return type of __call__ should be widened (e.g. to magnitude_scaling.Array | tuple[magnitude_scaling.Array, magnitude_scaling.Array]) to reflect the actual behavior exercised in these tests.

Suggested change
) -> float: ...
) -> magnitude_scaling.Array | tuple[magnitude_scaling.Array, magnitude_scaling.Array]: ...

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +94
@overload
def __getitem__(self, index: int) -> pd.DataFrame: ... # numpydoc ignore=GL08
@overload
def __getitem__(
self, index: slice
) -> Sequence[pd.DataFrame]: ... # numpydoc ignore=GL08

Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overloads for Segments.__getitem__ advertise support for both int and slice indices, but the implementation still only accepts int and explicitly raises a TypeError for non-int indices. Either the slice overload should be removed/adjusted until slicing is implemented, or the implementation should be updated to handle slices so that the runtime behavior matches the type hints.

Suggested change
@overload
def __getitem__(self, index: int) -> pd.DataFrame: ... # numpydoc ignore=GL08
@overload
def __getitem__(
self, index: slice
) -> Sequence[pd.DataFrame]: ... # numpydoc ignore=GL08

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do at a different type.

@property
def geojson(self) -> dict: # numpydoc ignore=RT01
def geojson(self) -> str: # numpydoc ignore=RT01
"""dict: A GeoJSON representation of the fault."""
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The geojson property now has a return type annotation of str and indeed returns the string produced by shapely.to_geojson, but the docstring still states "dict: A GeoJSON representation of the fault.". To avoid confusing users about the API, the docstring should be updated to describe the actual return type (e.g. a JSON-formatted string) or the implementation changed to return a parsed dictionary.

Suggested change
"""dict: A GeoJSON representation of the fault."""
"""str: A GeoJSON-formatted string representation of the point source."""

Copilot uses AI. Check for mistakes.
@property
def geojson(self) -> dict: # numpydoc ignore=RT01
def geojson(self) -> str: # numpydoc ignore=RT01
"""dict: A GeoJSON representation of the fault."""
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The geojson property here is annotated to return str and forwards the string output of shapely.to_geojson, but the docstring still claims dict: A GeoJSON representation of the fault.. For clarity and consistency, the docstring should be updated to describe the string return value (or, if a dict is intended, the implementation should JSON-decode the string before returning it).

Suggested change
"""dict: A GeoJSON representation of the fault."""
"""str: A GeoJSON string representation of the fault."""

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +86
if isinstance(like, int):
coerced = int(value)
if isinstance(like, float):
coerced = float(value)
elif isinstance(like, np.ndarray):
coerced = np.asarray(value)
else:
assert isinstance(like, pd.Series)
coerced = pd.Series(value, index=like.index)
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_coerce_array_types uses two independent if statements starting at isinstance(like, int) and isinstance(like, float), with the second having an elif/else chain. For like values of type int, the first branch sets coerced, but the second if is false and the elif/else fall through to the else, triggering the assert isinstance(like, pd.Series) and raising instead of returning an int result. To fix this, the second if should be changed to elif, so that exactly one type branch is taken and integer inputs are handled correctly.

Copilot uses AI. Check for mistakes.
Comment on lines 28 to +26
- name: Run type checking with ty
run: uv run ty check --exclude source_modelling/ccldpy.py --exclude setup.py
run: uv run --all-extras ty check --exclude source_modelling/ccldpy.py --exclude setup.py
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this workflow, uv run --all-extras ty check ... assumes that ty is available, but in pyproject.toml it is only listed in the dev dependency group and not in the main dependencies or types extra. Depending on uv's defaults, this may mean ty is not installed when the job runs; to make the step robust, consider either selecting the dev group explicitly (e.g. with -g dev) or adding a preceding uv sync that includes the dev group.

Copilot uses AI. Check for mistakes.
Comment on lines 95 to 97
def __getitem__(self, index: int) -> pd.DataFrame:
"""Get the nth segment in the SRF.

Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Segments.__getitem__ docstring currently indicates in its Returns section that the method returns an int, but the implementation actually returns a pd.DataFrame representing the selected segment. To avoid confusing users, the documented return type should be updated to pd.DataFrame (and extended appropriately if/when slice support is added).

Copilot uses AI. Check for mistakes.
"""
if isinstance(like, int):
coerced = int(value)
if isinstance(like, float):
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'coerced' is unnecessary as it is redefined before this value is used.
This assignment to 'coerced' is unnecessary as it is redefined before this value is used.
This assignment to 'coerced' is unnecessary as it is redefined before this value is used.

Suggested change
if isinstance(like, float):
elif isinstance(like, float):

Copilot uses AI. Check for mistakes.
(trace_points_nztm, np.array([dbottom, dbottom]))
)
else:
assert dip_dir_nztm is not None
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type checker points out that technically this value could be None. I don't think anyone is likely to do that, but I threw in an assert to catch this out.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the first three if statements of this function ensure that dip_dir_nztm can't be None? So I don't think thats needed?

Comment on lines +996 to +998
points_into_graph = nx.from_dict_of_lists(
points_into_relation, # type: ignore[invalid-argument-type]
create_using=nx.DiGraph,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here I think this is a bug with networkx

@@ -234,7 +242,7 @@ def from_file(cls, srf_ffp: Path | str) -> Self:
point_count = int(points_count_match.group(1))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add logic here to handle slicing and negative indices, but then removed it and will make a separate PR at some point.

Comment on lines +607 to +617
def write_srf(srf_ffp: Path | str, srf: SrfFile) -> None:
"""Write an SRF object to a filepath.

Parameters
----------
srf_ffp : Path
srf_ffp : Path | str
The filepath to write the srf object to.
srf : SrfFile
The SRF object.
"""
srf.write_srf(srf_ffp)
srf.write_srf(Path(srf_ffp))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got an error in the tests for passing in a str, didn't see any reason this couldn't be a str so I just coerced it. Will remove this function in a later PR anyway.

shapely.Point(
coordinates.wgs_depth_to_nztm(point[["lat", "lon", "dep"]].values)
coordinates.wgs_depth_to_nztm(
np.asarray(point[["lat", "lon", "dep"]].values)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically .values returns a wacky pandas extension array. I don't really think it breaks anything, so I just immediately convert launder it to a numpy array.

def test_normal_error_contreras_interface(area_to_mag: RandomFunction, area: float):
"""Generate samples with random = True set on area_to_mag and check that it approximates the value with random = False."""
samples = [area_to_mag(area, random=True) for _ in range(300)]
samples = area_to_mag(np.full(300, area), random=True)
Copy link
Copy Markdown
Contributor Author

@lispandfound lispandfound Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of the vectorisation changes in this PR for the tests. The next few changes are just repeating this.

Array = int | float | npt.NDArray[np.floating] | pd.Series
TArray = TypeVar("TArray", bound=Array)


Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helper functions are used to coerce types and sample values in the distributions with a matching size of the input.



[tool.ty.rules]
# Downgraded to warning because operators -,+,*,/ are too annoying to type properly
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of the subtractions in magnitude scaling module are technically invalid only because of the cursed way numpy operates. It can't really handle the case where you do something like float - array because it uses the float __sub__ which technically only accepts other floats. This returns a not implemented error, and then it uses the array __rsub__ method, which does work. But the type checker doesn't understand this and stops at __sub__ doesn't work.

Copy link
Copy Markdown
Contributor

@joelridden joelridden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just an issue with the pytest and h5py installation by the looks of it

return typing.cast(TArray, coerced)


# Purpose of this function is to limit the number of ways that values
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should go in the docstring of the function?

(trace_points_nztm, np.array([dbottom, dbottom]))
)
else:
assert dip_dir_nztm is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the first three if statements of this function ensure that dip_dir_nztm can't be None? So I don't think thats needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants