Fix magnitude scaling module to allow array types for magnitude to area functions#56
Fix magnitude scaling module to allow array types for magnitude to area functions#56lispandfound wants to merge 14 commits intomainfrom
Conversation
…rea functions" This reverts commit c2784ad.
Summary of ChangesHello @lispandfound, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on improving the flexibility and robustness of the magnitude scaling module by allowing array types for magnitude to area functions. It also updates the development dependencies to include Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to allow array-like inputs for magnitude scaling functions by updating their type signatures. While this is a good improvement, the implementations of these functions have not been updated to handle array inputs. Most of the modified functions use conditional logic (e.g., if magnitude < 6:) that will raise a ValueError at runtime when passed a numpy array or pandas Series. I've left specific comments on the functions that need to be refactored using vectorized operations like np.where or boolean masking. Additionally, I found a minor error in a docstring. Please address these critical issues to ensure the functions work as intended with array inputs. It would also be beneficial to add tests that use array-like inputs (e.g., numpy arrays) to verify the vectorized implementations and prevent similar issues in the future.
Turns out you need to get more aggressive to ensure the return types stay correct!
There was a problem hiding this comment.
Pull request overview
This PR generalizes the magnitude scaling functions to accept array-like inputs (NumPy arrays and pandas Series) in addition to scalars, and updates tests and CI tooling accordingly.
Changes:
- Introduced shared array/series typing helpers in
magnitude_scaling.pyand refactored individual scaling relations to work with scalar and array-like magnitudes/areas, including vectorized random sampling and warnings. - Extended the test suite to verify type preservation (float/ndarray/Series) across all relevant scaling functions and adjusted stochastic tests to use batched inputs.
- Updated project metadata and CI workflows (Python version constraint, deptry configuration, and system dependencies) to support the new setup.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
source_modelling/magnitude_scaling.py |
Introduces Array / TArray typing, helper utilities for sampling and type coercion, and updates all scaling functions and high-level wrappers to support scalar and array-like inputs and outputs. |
tests/test_magnitude_scaling.py |
Adjusts stochastic tests to call scaling functions with array inputs and adds a new parametrized test to ensure return types match the input container type (float, NumPy array, pandas Series). |
pyproject.toml |
Broadens the supported Python versions to >=3.11, reorganizes dev/type dependencies, and configures deptry’s dev dependency groups. |
.github/workflows/pytest.yml |
Ensures CI installs additional system packages (libgdal-dev, gdal-bin) required by the dependency stack. |
.github/workflows/deptry.yml |
Simplifies the deptry invocation to run against the whole project with configuration now provided via pyproject.toml. |
Comments suppressed due to low confidence (1)
source_modelling/magnitude_scaling.py:158
- The docstring for
leonard_area_to_magnitudestill documents the return type asfloat, but the function is now typed as returningTArrayand is exercised with NumPy arrays andpd.Seriesin the tests, so the documentation is misleading. Please update theReturnssection to describe the widened return type (and, if relevant, mention that a scalar input yields a scalar output and array-like input yields the corresponding array-like output).
def leonard_area_to_magnitude(
area: TArray, rake: float, random: bool = False
) -> TArray:
"""Convert area to magnitude using the leonard scaling relationship [0]_.
Parameters
----------
area : TArray
Area of the fault (km^2).
rake : float
Rake of the fault (degrees).
random : bool, optional
If True, sample parameters according to uncertainties in the
paper, otherwise use the *best-fit* values. Default is False.
Returns
-------
float
Moment magnitude of the fault.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Made this change, but then decided it should live in a separate PR
|
It has been a while since the type checking has been done for this repo, so there was a few change I had to make to get it to comply. Mostly inconsistencies and bugs that the type checker found (e.g. incorrect return type for the |
There was a problem hiding this comment.
gdal is required for fiona now.
There was a problem hiding this comment.
Just moved this config to pyproject.toml.
There was a problem hiding this comment.
uv sync is not necessary, uv run does that by default.
There was a problem hiding this comment.
The type ignore statement used to be there when the type stubs for DisjointSet didn't work. I made a PR to scipy that fixed it a few weeks ago.
| @overload | ||
| def sampled_spanning_tree( | ||
| graph: nx.Graph, n_samples: int = 1 | ||
| ) -> nx.Graph: ... # numpydoc ignore=GL08 | ||
|
|
||
|
|
||
| @overload | ||
| def sampled_spanning_tree( | ||
| graph: nx.Graph, n_samples: int = ... | ||
| ) -> list[nx.Graph]: ... # numpydoc ignore=GL08 | ||
|
|
||
|
|
There was a problem hiding this comment.
The usual overload statements so that Python knows how to infer whether it is a single graph or many sampled graphs.
| spanning_trees = [] | ||
|
|
||
| for spanning_tree in mst.SpanningTreeIterator(weighted_graph, minimum=False): | ||
| for spanning_tree in mst.SpanningTreeIterator(weighted_graph, minimum=False): # type: ignore[invalid-argument-type] |
There was a problem hiding this comment.
This seems to be a bug in networkx type stubs. It should be accept either a graph or an undirected graph because the code works for both, but it currently only accepts a DiGraph. Will make a PR to networkx at some point to fix this.
| return self.geometry.distance( | ||
| shapely.Point(coordinates.wgs_depth_to_nztm(point)) | ||
| return shapely.distance( | ||
| self.geometry, shapely.Point(coordinates.wgs_depth_to_nztm(point)) | ||
| ) |
There was a problem hiding this comment.
Type checker correctly points out that a shapely.Geometry object doesn't necessarily have a distance member. To maximise the likelihood this works, I have changed this to be a shapely.distance invocation.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 14 changed files in this pull request and generated 13 comments.
Comments suppressed due to low confidence (1)
source_modelling/magnitude_scaling.py:935
area_to_magnitudeacceptsrake: float | Noneand forwardsrakedirectly intoleonard_area_to_magnitudefor theLEONARD2014case without validation, so callingarea_to_magnitude(ScalingRelation.LEONARD2014, area, rake=None)will fail later with aTypeErrorinrake_typeinstead of a clearValueErrorlikemagnitude_to_arearaises. It would be better for API consistency and error reporting to mirrormagnitude_to_areaand explicitly raise ifscaling_relationisLEONARD2014andrakeisNone.
def area_to_magnitude(
scaling_relation: ScalingRelation,
area: TArray,
rake: float | None = None,
random: bool = False,
) -> TArray:
"""Convert area to magnitude using a scaling relationship.
Parameters
----------
scaling_relation : ScalingRelation
Scaling relation to use.
area : TArray
Area of the fault (km^2).
rake : float, optional
Rake of the fault (degrees). Required for Leonard scaling.
random : bool, optional
If True, sample parameters according to uncertainties in the
paper, otherwise use the mean values. Default is False.
Returns
-------
TArray
Moment magnitude of the fault estimated by the scaling relation.
"""
scaling_relations_map = {
ScalingRelation.LEONARD2014: functools.partial(
leonard_area_to_magnitude, rake=rake, random=random
),
ScalingRelation.CONTRERAS_INTERFACE2017: functools.partial(
contreras_interface_area_to_magnitude, random=random
),
ScalingRelation.CONTRERAS_SLAB2020: functools.partial(
strasser_slab_area_to_magnitude, random=random
),
}
return scaling_relations_map[scaling_relation](area)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| else 0 | ||
| ) | ||
|
|
||
| return a + sigma_a + (b + sigma_b) * np.log10(area) |
There was a problem hiding this comment.
Several of the scaling functions (e.g. contreras_interface_area_to_magnitude, contreras_interface_magnitude_to_area, strasser_slab_area_to_magnitude, strasser_slab_magnitude_to_area) are included in RETURN_TYPE_FUNCS and are expected by test_scaling_output_types to preserve the input scalar type (float), but they return NumPy scalar types (e.g. from np.log, np.log10, np.exp, np.pow) rather than plain Python floats, so isinstance(result, float) will be false. Either these functions should use _coerce_array_types (as you do for the Leonard and some Contreras functions) to coerce results back to the input type, or the tests should be relaxed to accept NumPy scalar subclasses as valid for the scalar case.
| return a + sigma_a + (b + sigma_b) * np.log10(area) | |
| magnitude = a + sigma_a + (b + sigma_b) * np.log10(area) | |
| return _coerce_array_types(area, magnitude) |
There was a problem hiding this comment.
Nah. The tests suggest that floats, arrays, and series return floats, arrays, and series respectively. So I didn't add the _coerce_array_types function here because it doesn't need it.
tests/test_magnitude_scaling.py
Outdated
| x: magnitude_scaling.Array, | ||
| random: bool = False, | ||
| rake: float | None = None, | ||
| ) -> float: ... |
There was a problem hiding this comment.
The RandomFunction protocol now takes x: magnitude_scaling.Array but still declares a return type of float, even though the scaling functions under test can now return arrays or pandas Series when given non-scalar inputs. To keep the type hints accurate and avoid confusion in type checking, the return type of __call__ should be widened (e.g. to magnitude_scaling.Array | tuple[magnitude_scaling.Array, magnitude_scaling.Array]) to reflect the actual behavior exercised in these tests.
| ) -> float: ... | |
| ) -> magnitude_scaling.Array | tuple[magnitude_scaling.Array, magnitude_scaling.Array]: ... |
| @overload | ||
| def __getitem__(self, index: int) -> pd.DataFrame: ... # numpydoc ignore=GL08 | ||
| @overload | ||
| def __getitem__( | ||
| self, index: slice | ||
| ) -> Sequence[pd.DataFrame]: ... # numpydoc ignore=GL08 | ||
|
|
There was a problem hiding this comment.
The overloads for Segments.__getitem__ advertise support for both int and slice indices, but the implementation still only accepts int and explicitly raises a TypeError for non-int indices. Either the slice overload should be removed/adjusted until slicing is implemented, or the implementation should be updated to handle slices so that the runtime behavior matches the type hints.
| @overload | |
| def __getitem__(self, index: int) -> pd.DataFrame: ... # numpydoc ignore=GL08 | |
| @overload | |
| def __getitem__( | |
| self, index: slice | |
| ) -> Sequence[pd.DataFrame]: ... # numpydoc ignore=GL08 |
There was a problem hiding this comment.
Will do at a different type.
| @property | ||
| def geojson(self) -> dict: # numpydoc ignore=RT01 | ||
| def geojson(self) -> str: # numpydoc ignore=RT01 | ||
| """dict: A GeoJSON representation of the fault.""" |
There was a problem hiding this comment.
The geojson property now has a return type annotation of str and indeed returns the string produced by shapely.to_geojson, but the docstring still states "dict: A GeoJSON representation of the fault.". To avoid confusing users about the API, the docstring should be updated to describe the actual return type (e.g. a JSON-formatted string) or the implementation changed to return a parsed dictionary.
| """dict: A GeoJSON representation of the fault.""" | |
| """str: A GeoJSON-formatted string representation of the point source.""" |
| @property | ||
| def geojson(self) -> dict: # numpydoc ignore=RT01 | ||
| def geojson(self) -> str: # numpydoc ignore=RT01 | ||
| """dict: A GeoJSON representation of the fault.""" |
There was a problem hiding this comment.
The geojson property here is annotated to return str and forwards the string output of shapely.to_geojson, but the docstring still claims dict: A GeoJSON representation of the fault.. For clarity and consistency, the docstring should be updated to describe the string return value (or, if a dict is intended, the implementation should JSON-decode the string before returning it).
| """dict: A GeoJSON representation of the fault.""" | |
| """str: A GeoJSON string representation of the fault.""" |
| if isinstance(like, int): | ||
| coerced = int(value) | ||
| if isinstance(like, float): | ||
| coerced = float(value) | ||
| elif isinstance(like, np.ndarray): | ||
| coerced = np.asarray(value) | ||
| else: | ||
| assert isinstance(like, pd.Series) | ||
| coerced = pd.Series(value, index=like.index) |
There was a problem hiding this comment.
_coerce_array_types uses two independent if statements starting at isinstance(like, int) and isinstance(like, float), with the second having an elif/else chain. For like values of type int, the first branch sets coerced, but the second if is false and the elif/else fall through to the else, triggering the assert isinstance(like, pd.Series) and raising instead of returning an int result. To fix this, the second if should be changed to elif, so that exactly one type branch is taken and integer inputs are handled correctly.
| - name: Run type checking with ty | ||
| run: uv run ty check --exclude source_modelling/ccldpy.py --exclude setup.py | ||
| run: uv run --all-extras ty check --exclude source_modelling/ccldpy.py --exclude setup.py |
There was a problem hiding this comment.
In this workflow, uv run --all-extras ty check ... assumes that ty is available, but in pyproject.toml it is only listed in the dev dependency group and not in the main dependencies or types extra. Depending on uv's defaults, this may mean ty is not installed when the job runs; to make the step robust, consider either selecting the dev group explicitly (e.g. with -g dev) or adding a preceding uv sync that includes the dev group.
| def __getitem__(self, index: int) -> pd.DataFrame: | ||
| """Get the nth segment in the SRF. | ||
|
|
There was a problem hiding this comment.
The Segments.__getitem__ docstring currently indicates in its Returns section that the method returns an int, but the implementation actually returns a pd.DataFrame representing the selected segment. To avoid confusing users, the documented return type should be updated to pd.DataFrame (and extended appropriately if/when slice support is added).
| """ | ||
| if isinstance(like, int): | ||
| coerced = int(value) | ||
| if isinstance(like, float): |
There was a problem hiding this comment.
This assignment to 'coerced' is unnecessary as it is redefined before this value is used.
This assignment to 'coerced' is unnecessary as it is redefined before this value is used.
This assignment to 'coerced' is unnecessary as it is redefined before this value is used.
| if isinstance(like, float): | |
| elif isinstance(like, float): |
| (trace_points_nztm, np.array([dbottom, dbottom])) | ||
| ) | ||
| else: | ||
| assert dip_dir_nztm is not None |
There was a problem hiding this comment.
Type checker points out that technically this value could be None. I don't think anyone is likely to do that, but I threw in an assert to catch this out.
There was a problem hiding this comment.
Doesn't the first three if statements of this function ensure that dip_dir_nztm can't be None? So I don't think thats needed?
| points_into_graph = nx.from_dict_of_lists( | ||
| points_into_relation, # type: ignore[invalid-argument-type] | ||
| create_using=nx.DiGraph, |
There was a problem hiding this comment.
Similarly here I think this is a bug with networkx
| @@ -234,7 +242,7 @@ def from_file(cls, srf_ffp: Path | str) -> Self: | |||
| point_count = int(points_count_match.group(1)) | |||
There was a problem hiding this comment.
I did add logic here to handle slicing and negative indices, but then removed it and will make a separate PR at some point.
| def write_srf(srf_ffp: Path | str, srf: SrfFile) -> None: | ||
| """Write an SRF object to a filepath. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| srf_ffp : Path | ||
| srf_ffp : Path | str | ||
| The filepath to write the srf object to. | ||
| srf : SrfFile | ||
| The SRF object. | ||
| """ | ||
| srf.write_srf(srf_ffp) | ||
| srf.write_srf(Path(srf_ffp)) |
There was a problem hiding this comment.
Got an error in the tests for passing in a str, didn't see any reason this couldn't be a str so I just coerced it. Will remove this function in a later PR anyway.
| shapely.Point( | ||
| coordinates.wgs_depth_to_nztm(point[["lat", "lon", "dep"]].values) | ||
| coordinates.wgs_depth_to_nztm( | ||
| np.asarray(point[["lat", "lon", "dep"]].values) |
There was a problem hiding this comment.
Technically .values returns a wacky pandas extension array. I don't really think it breaks anything, so I just immediately convert launder it to a numpy array.
| def test_normal_error_contreras_interface(area_to_mag: RandomFunction, area: float): | ||
| """Generate samples with random = True set on area_to_mag and check that it approximates the value with random = False.""" | ||
| samples = [area_to_mag(area, random=True) for _ in range(300)] | ||
| samples = area_to_mag(np.full(300, area), random=True) |
There was a problem hiding this comment.
An example of the vectorisation changes in this PR for the tests. The next few changes are just repeating this.
| Array = int | float | npt.NDArray[np.floating] | pd.Series | ||
| TArray = TypeVar("TArray", bound=Array) | ||
|
|
||
|
|
There was a problem hiding this comment.
These helper functions are used to coerce types and sample values in the distributions with a matching size of the input.
|
|
||
|
|
||
| [tool.ty.rules] | ||
| # Downgraded to warning because operators -,+,*,/ are too annoying to type properly |
There was a problem hiding this comment.
A bunch of the subtractions in magnitude scaling module are technically invalid only because of the cursed way numpy operates. It can't really handle the case where you do something like float - array because it uses the float __sub__ which technically only accepts other floats. This returns a not implemented error, and then it uses the array __rsub__ method, which does work. But the type checker doesn't understand this and stops at __sub__ doesn't work.
joelridden
left a comment
There was a problem hiding this comment.
Nice, just an issue with the pytest and h5py installation by the looks of it
| return typing.cast(TArray, coerced) | ||
|
|
||
|
|
||
| # Purpose of this function is to limit the number of ways that values |
There was a problem hiding this comment.
Seems like this should go in the docstring of the function?
| (trace_points_nztm, np.array([dbottom, dbottom])) | ||
| ) | ||
| else: | ||
| assert dip_dir_nztm is not None |
There was a problem hiding this comment.
Doesn't the first three if statements of this function ensure that dip_dir_nztm can't be None? So I don't think thats needed?
Small PR that just adjusts the type signatures of the magnitude scaling functions to allow arrays as well as scalars. Plus minor CI fixes because of python dependencies (hence the really large PR count, it is just the
uv.lockfile!).Also:
I've made a bunch of comments in the PR to explain the rationale behind the changes to make it easier to review (i.e. what you can skip in your review).