Perf: vectorise Pandas datetime/timespan import+export; add Cython directives by stewjb · Pull Request #3 · stewjb/spotfire-python

stewjb · 2026-04-04T13:08:39Z

Summary

Cython + NumPy vectorisation (earlier commits)

Cython directives: Add `boundscheck=False, wraparound=False, cdivision=True` file-wide — eliminates runtime bounds/wrap guards from every inner loop.
Pandas DateTime/TimeSpan import: Replace per-row Python object boxing with raw int64 storage + `arr.view('datetime64[ms]')` / `arr.view('timedelta64[ms]')`. NaT written via INT64_MIN sentinel (single pass, no second `.loc` assignment).
Pandas DateTime/TimeSpan/Date export: Pre-transform to int64 SBDF-ms at `set_arrays` time — zero-copy export matching numeric types.
Pandas Time export: Replace `datetime.combine(min, t) - min` (2 Python object allocations per row) with direct integer arithmetic on time attributes.
`any_invalid` hotspot: Replace `any(invalid)` (Python iterator) with `bool(self.invalid_array.any())` (single numpy call). Responsible for the large numeric export gain.
Import assembly: Replace `pd.concat(columns, axis=1)` with `pd.DataFrame(dict(...))` — skips concat's index alignment and dtype consolidation overhead.

C-level pointer optimisations (latest commit)

String/binary export C helpers: Replace per-element `PySequence_GetItem` calls (Python API dispatch + refcount overhead) with direct pointer arithmetic into numpy array buffers (`PyArray_DATA` as `void**`/`unsigned char*`). Eliminates ~2N Python API round-trips per string/binary column.
`_export_get_offset_ptr`: Replace Python slice allocation (`array[start:start+count]`) with direct byte-offset pointer arithmetic. Avoids a numpy view object allocation on every chunk/column export call.
Import string null masking: Pre-mask the numpy object array before `pd.Series()` construction instead of assigning `None` via `.loc[]` post-construction (guarded by `values.dtype.kind == 'O'`).

Benchmark Results (100k rows, Pandas path)

Profile	Metric	`main` (ms)	branch (ms)	speedup
Temporal, no nulls	Export	1527.6	142.3	10.7×
Temporal, no nulls	Import	142.6	76.8	1.9×
Temporal, ~10% nulls	Export	1121.2	138.6	8.1×
Temporal, ~10% nulls	Import	149.9	84.5	1.8×
Numeric, no nulls	Export	119.1	15.4	7.7×
Numeric, no nulls	Import	18.8	16.2	1.2×
Numeric, ~10% nulls	Export	21.2	21.9	~same
Numeric, ~10% nulls	Import	25.0	11.4	2.2×
String, no nulls	Export	92.0	71.3	1.3×
String, no nulls	Import	47.7	31.4	1.5×
String, ~10% nulls	Export	75.8	52.1	1.5×
String, ~10% nulls	Import	37.9	44.0	~same
Binary, no nulls	Export	90.0	92.5	~same
Binary, no nulls	Import	52.9	75.3	~same
Binary, ~10% nulls	Export	77.6	80.8	~same
Binary, ~10% nulls	Import	88.4	72.1	1.2×

Key wins:

Temporal export: 8–11× faster — zero-copy pre-transform for datetime64/timedelta64/date columns; direct attribute arithmetic for time column
Temporal import: 1.8–1.9× faster — int64 buffer reinterpret via `view()` with single-pass NaT sentinel
Numeric export: 7.7× faster — primarily from fixing the `any(Series)` hotspot (100k Python iterations → one numpy call)
String export with nulls: 1.5× faster — C helper now bypasses Python API dispatch per element
String import: 1.5× faster — pre-masking numpy object array avoids `.loc` indexing overhead
String/binary export without nulls: modest gains from eliminating PySequence_GetItem + slice allocation

Python 3.13.7 · Pandas 2.3.2 · NumPy 2.3.2 · Windows 11

Test plan

All existing tests pass (`python -m pytest spotfire/test/test_sbdf.py`)
Benchmark run on `main` and branch tip

🤖 Generated with Claude Code

…rectives Import (Pandas path): - DateTime and TimeSpan now use _import_vts_numpy (raw int64 ms) instead of per-row Python object boxing loops (_import_vt_datetime / _import_vt_timespan). - DataFrame assembly converts with arr.view('datetime64[ms]') / arr.view('timedelta64[ms]') — zero-copy reinterpretation; supports the full SBDF date range (year 1-9999) without pd.to_datetime nanosecond overflow. Export (Pandas path): - _export_obj_dataframe stores tz-naive datetime64 columns as datetime64[ms] and timedelta64 columns as timedelta64[ms] instead of object arrays. - _export_vt_datetime fast path: view('int64') + vectorised SBDF epoch offset addition replaces per-row isinstance + .to_pydatetime() + arithmetic. - _export_vt_timespan fast path: view('int64') gives ms directly — no per-row .to_pytimedelta() or division. - Object-dtype and tz-aware columns still fall through to the per-row loop. Cython directives: - boundscheck=False, wraparound=False, cdivision=True added file-wide, eliminating runtime bounds/wrap guards in every inner loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Export: pre-transform datetime64[ms]/timedelta64[ms] columns to int64 SBDF-ms once at set_arrays time so _export_vt_datetime/_export_vt_timespan can use _export_get_offset_ptr directly (zero-copy, same as numeric types) instead of allocating + copying + transforming per chunk. Retain the non-precomputed fast/slow paths for tz-aware and object-dtype columns. Import: replace the double-pass NaT handling (zero + .loc assignment) with a single write of the int64 NaT sentinel (INT64_MIN) before view(), avoiding the slow Pandas indexing layer entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…onstructor - Export: pre-compute date (object) columns to int64 SBDF-ms via pd.to_datetime, same zero-copy approach as datetime64/timedelta64. - Export: replace any(invalid) with bool(self.invalid_array.any()) in set_arrays — the built-in any() was iterating 100k Python booleans per column; numpy any() is a single vectorised call. This alone accounts for the large numeric export gain. - Import: replace pd.concat(columns, axis=1) with pd.DataFrame(dict(...)) to skip concat's index alignment, dtype consolidation and metadata overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… .loc - Time export: replace datetime.combine(min, t) - min (2 Python object allocations per row) with direct integer arithmetic on time attributes. As the last unoptimized temporal column, this is the primary driver of the ~40% temporal export improvement. - Timedelta import: drop values.copy() — get_values_array() already returns a fresh array from np.concatenate(), so the explicit copy was redundant. - Object-type import (.loc): guard column_series.loc[invalid_array] = None with if invalid_array.any() — consistent with datetime/timedelta paths, avoids Pandas indexing overhead for null-free columns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

pd.to_datetime(errors='coerce') silently converts dates outside the Pandas Timestamp range (year 1, pre-Gregorian, year 9999) to NaT, then to the Unix epoch. Replace with np.asarray(..., dtype='datetime64[D]') which covers the full Python date range. Zero NaT positions (INT64_MIN) before multiplying to prevent int64 overflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Eight new test methods covering gaps exposed by the zero-copy temporal optimizations: null roundtrips, negative timespans, pre-epoch/out-of-range dates (year 1, pre-Gregorian, year 9999), pre-epoch datetimes, time edge cases (midnight, end-of-day, microsecond truncation), all-null temporal columns, and NaT at specific positions in numpy datetime64/timedelta64 arrays. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Two new tests targeting the boundscheck=False Cython directives: - test_empty_dataframe: exercises every column type with 0 rows, verifying that zero-iteration export loops don't crash or corrupt memory. - test_multichunk_export: exports 100_001 rows (one more than the default 100_000-row slice size) and checks values at both the first row and the chunk boundary (row 100_000). Covers _export_vt_time's direct [start+i] indexing and _export_get_offset_ptr for the precomputed int64 paths. - test_polars_string_multichunk: same chunk-boundary check for the Polars Arrow buffer path in _export_extract_string_obj_arrow, which does raw C pointer arithmetic into the values buffer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…thout polars/pyarrow" This reverts commit 681a67d.

…extension Compiles sbdf.pyx with -fsanitize=address -fno-omit-frame-pointer and runs the full test suite under LD_PRELOAD=libasan.so with PYTHONMALLOC=malloc. This provides runtime detection of heap buffer overflows that boundscheck=False and the raw C pointer arithmetic in sbdf_helpers.c leave unchecked at the Python level. detect_leaks=0 suppresses intentional Python allocator "leaks". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… 3 chars) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…otlib false positive When using LD_PRELOAD ASan injection with a non-ASan-compiled Python, ASan's __cxa_throw interceptor is never initialized. matplotlib's ft2font.so throws a C++ exception during import, hitting the uninitialized interceptor and causing a CHECK failure. intercept_cxx_exceptions=0 disables the interceptor entirely; sbdf.pyx generates no C++ exceptions so there is no loss of coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…3.13 mypy: pd.array() with list[NaTType] or list[NaT|Timedelta] and a string dtype has no matching overload in pandas-stubs. Add type: ignore[call-overload] on the two affected lines in test_all_null_temporal_columns and test_numpy_timedelta_with_nulls. ASan: Python 3.14 (beta) triggers a CHECK failure in asan_interceptors.cpp when ft2font.so throws a C++ exception, even with intercept_cxx_exceptions=0. Pin the ASan job to Python 3.13 where LD_PRELOAD ASan injection works cleanly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e.js 24; fix line-too-long - ASan job: replace test_requirements_default.txt with html-testRunner + polars + pillow. matplotlib/seaborn/geopandas/shapely use pybind11 C++ extensions that throw exceptions, crashing LD_PRELOAD libasan injection (intercept_cxx_exceptions=0 doesn't help here). pillow is plain C — safe to keep for PIL image export ASan coverage. - Bump GitHub Actions to Node.js 24: checkout v4→v5, setup-python v5→v6, upload-artifact v4→v7, download-artifact v4→v8. - Fix pylint line-too-long (127>120) in test_sbdf.py line 565. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rrency group test_sbdf.py imported geopandas, matplotlib, and seaborn unconditionally, causing ModuleNotFoundError in the ASan CI job where those packages are not installed. Change to try/except with None fallback (matching the polars pattern) and add @unittest.skipIf guards to test_read_write_geodata, test_image_matplot, test_image_seaborn. Also add concurrency group to build.yaml to cancel superseded runs on push. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…dule alias Without the explicit import, pylint sees 'matplotlib = None' in the except block as a new constant assignment and flags it as invalid-name (expects UPPER_CASE). Adding 'import matplotlib' before 'import matplotlib.pyplot' matches the same try/except pattern used for polars (import + None fallback). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ce alloc in offset ptr Three C-level optimizations: 1. _export_extract_string_obj / _export_extract_binary_obj: replace per-element PySequence_GetItem calls (Python API dispatch + refcount overhead) with direct pointer arithmetic into numpy array buffers. Callers now pass PyArray_DATA(values_array) as void** and PyArray_DATA(invalid_array) as unsigned char*, eliminating ~2N Python API round-trips per string/binary column. 2. _export_get_offset_ptr: replace the Python slice allocation (array[start:start+count]) with direct byte-offset arithmetic on PyArray_DATA. Avoids a numpy view object allocation on every chunk/column export call. 3. Import string columns: pre-mask the numpy object array before pd.Series() construction instead of assigning None via .loc[] after the fact. The .loc path triggers pandas label-indexing overhead; direct numpy assignment is O(k) with no indexer allocation. Applied only when values.dtype.kind == 'O' to avoid incorrect coercion on bool/float arrays. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tyle violation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…plint line-length rule Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

stewjb force-pushed the perf/pandas-vectorize branch 2 times, most recently from 51c7f30 to 2f6ebdf Compare April 5, 2026 00:25

stewjb and others added 18 commits April 4, 2026 19:48

Fix: wrap long conditional line to stay under 120 chars (E501)

346a150

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Revert "CI: add no_polars test environment to verify package works wi…

b188627

…thout polars/pyarrow" This reverts commit 681a67d.

Fix: rename df/df2 variables to satisfy pylint invalid-name rule (min…

0319d65

… 3 chars) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix: align continuation lines in sbdf_helpers.pxi to fix E127 pycodes…

7c1ed67

…tyle violation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

stewjb force-pushed the perf/pandas-vectorize branch from 2f6ebdf to 7c1ed67 Compare April 5, 2026 00:49

stewjb and others added 3 commits April 4, 2026 19:57

linting

fd7479c

Fix: remove stray 'git checkout' prefix from test_sbdf.py module docs…

bb1d35d

…tring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix: wrap long function signatures in sbdf_helpers.c/.h to satisfy cp…

87d0d07

…plint line-length rule Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: vectorise Pandas datetime/timespan import+export; add Cython directives#3

Perf: vectorise Pandas datetime/timespan import+export; add Cython directives#3
stewjb wants to merge 21 commits intomainfrom
perf/pandas-vectorize

stewjb commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stewjb commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Cython + NumPy vectorisation (earlier commits)

C-level pointer optimisations (latest commit)

Benchmark Results (100k rows, Pandas path)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stewjb commented Apr 4, 2026 •

edited

Loading