Skip to content

[Testing] Enhanced Pandas Test Suite#104

Draft
stewjb wants to merge 6 commits intospotfiresoftware:mainfrom
stewjb:test/temporal-edge-cases
Draft

[Testing] Enhanced Pandas Test Suite#104
stewjb wants to merge 6 commits intospotfiresoftware:mainfrom
stewjb:test/temporal-edge-cases

Conversation

@stewjb
Copy link
Copy Markdown

@stewjb stewjb commented Apr 4, 2026

These are additional tests used in performance enhancements in the pandas path. These should pass in the current path, hence pushing this PR to show that before showing the proposed pandas updates. This PR doesn't impact any production code.

This PR should be stacked on #103. Leaving it as a draft until that one is merged.

…optional test imports

- Bump actions/checkout v4→v5, setup-python v5→v6, upload-artifact v4→v7,
  download-artifact v4→v8 across build.yaml, pylint.yaml, sbom.yaml.
- Add AddressSanitizer job to build.yaml (pinned to Python 3.13, LD_PRELOAD
  injection, limited to html-testRunner/polars/pillow to avoid pybind11 crashes).
- Add concurrency group to cancel superseded runs on push.
- Make geopandas/matplotlib/seaborn imports optional in test_sbdf.py so the
  module loads in environments where those packages are absent; add @skipIf
  guards to test_read_write_geodata, test_image_matplot, test_image_seaborn.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@stewjb
Copy link
Copy Markdown
Author

stewjb commented Apr 4, 2026

Here is a preview of the performance enhancements I arrived at on the pandas pathway. Note i have these on my fork. It's built on the polars branch, but I thought adding the CI and testing to main makes more sense from a testing and PR review perspective.

Benchmark Results (100k rows, Pandas path)

Profile Metric main (ms) branch (ms) speedup
Temporal, no nulls Export 1527.6 94.5 16.2×
Temporal, no nulls Import 142.6 66.0 2.2×
Temporal, ~10% nulls Export 1121.2 89.7 12.5×
Temporal, ~10% nulls Import 149.9 82.2 1.8×
Numeric, no nulls Export 119.1 13.5 8.8×
Numeric, no nulls Import 18.8 11.0 1.7×
Numeric, ~10% nulls Export 21.2 21.5 ~same
Numeric, ~10% nulls Import 25.0 11.3 2.2×
String, no nulls Export 92.0 81.6 ~same
String, no nulls Import 47.7 38.3 1.2×
String, ~10% nulls Export 75.8 71.9 ~same
String, ~10% nulls Import 37.9 52.9 ~same
Binary, no nulls Export 90.0 94.0 ~same
Binary, no nulls Import 52.9 62.2 ~same
Binary, ~10% nulls Export 77.6 82.4 ~same
Binary, ~10% nulls Import 88.4 63.9 1.4×

stewjb and others added 5 commits April 4, 2026 17:57
…cking

- Extend ASan job to also run UBSan (-fsanitize=address,undefined). UBSan shares
  the libasan.so LD_PRELOAD runtime so no extra preload is needed. Catches signed
  integer overflow, null pointer dereference, misaligned access, and other C UB
  in the Cython extension. UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1 makes
  CI fail clearly on the first finding with a full stack trace.
- Rename job 'asan' → 'sanitizers' and artifact 'test-results-asan' →
  'test-results-sanitizers' to reflect the combined coverage.
- Add .github/dependabot.yml to auto-PR GitHub Actions version bumps weekly,
  preventing future Node.js deprecation warnings from going unnoticed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All actions have been bumped to versions that natively use Node.js 24.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New tests for the export/import paths in sbdf.pyx:
- test_temporal_nulls_roundtrip: mixed-null datetime/date/time/timespan columns
- test_negative_timespans: negative timedelta values including sub-ms precision
- test_pre_epoch_dates: dates across full year-1..9999 range (regression for
  the pd.to_datetime → np.asarray fix that handles pre-Timestamp dates)
- test_pre_epoch_datetimes: datetimes before Unix epoch
- test_time_edge_cases: midnight, end-of-day, microsecond truncation
- test_all_null_temporal_columns: all-NaT datetime64/timedelta64 columns
- test_numpy_datetime_with_nulls: NaT at specific positions in datetime64[ms]
- test_numpy_timedelta_with_nulls: NaT at specific positions in timedelta64[ms]
- test_empty_dataframe: 0-row export for bool/int/float/datetime/timedelta/string
- test_multichunk_export: 100,001-row export forces a second SBDF row slice

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@stewjb stewjb force-pushed the test/temporal-edge-cases branch from f6d9544 to 5dd8573 Compare April 4, 2026 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant