Perf: vectorise Pandas datetime/timespan import+export; add Cython directives#3
Open
Perf: vectorise Pandas datetime/timespan import+export; add Cython directives#3
Conversation
51c7f30 to
2f6ebdf
Compare
…rectives
Import (Pandas path):
- DateTime and TimeSpan now use _import_vts_numpy (raw int64 ms) instead of
per-row Python object boxing loops (_import_vt_datetime / _import_vt_timespan).
- DataFrame assembly converts with arr.view('datetime64[ms]') /
arr.view('timedelta64[ms]') — zero-copy reinterpretation; supports the full
SBDF date range (year 1-9999) without pd.to_datetime nanosecond overflow.
Export (Pandas path):
- _export_obj_dataframe stores tz-naive datetime64 columns as datetime64[ms]
and timedelta64 columns as timedelta64[ms] instead of object arrays.
- _export_vt_datetime fast path: view('int64') + vectorised SBDF epoch offset
addition replaces per-row isinstance + .to_pydatetime() + arithmetic.
- _export_vt_timespan fast path: view('int64') gives ms directly — no per-row
.to_pytimedelta() or division.
- Object-dtype and tz-aware columns still fall through to the per-row loop.
Cython directives:
- boundscheck=False, wraparound=False, cdivision=True added file-wide,
eliminating runtime bounds/wrap guards in every inner loop.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Export: pre-transform datetime64[ms]/timedelta64[ms] columns to int64 SBDF-ms once at set_arrays time so _export_vt_datetime/_export_vt_timespan can use _export_get_offset_ptr directly (zero-copy, same as numeric types) instead of allocating + copying + transforming per chunk. Retain the non-precomputed fast/slow paths for tz-aware and object-dtype columns. Import: replace the double-pass NaT handling (zero + .loc assignment) with a single write of the int64 NaT sentinel (INT64_MIN) before view(), avoiding the slow Pandas indexing layer entirely. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onstructor - Export: pre-compute date (object) columns to int64 SBDF-ms via pd.to_datetime, same zero-copy approach as datetime64/timedelta64. - Export: replace any(invalid) with bool(self.invalid_array.any()) in set_arrays — the built-in any() was iterating 100k Python booleans per column; numpy any() is a single vectorised call. This alone accounts for the large numeric export gain. - Import: replace pd.concat(columns, axis=1) with pd.DataFrame(dict(...)) to skip concat's index alignment, dtype consolidation and metadata overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… .loc - Time export: replace datetime.combine(min, t) - min (2 Python object allocations per row) with direct integer arithmetic on time attributes. As the last unoptimized temporal column, this is the primary driver of the ~40% temporal export improvement. - Timedelta import: drop values.copy() — get_values_array() already returns a fresh array from np.concatenate(), so the explicit copy was redundant. - Object-type import (.loc): guard column_series.loc[invalid_array] = None with if invalid_array.any() — consistent with datetime/timedelta paths, avoids Pandas indexing overhead for null-free columns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pd.to_datetime(errors='coerce') silently converts dates outside the Pandas Timestamp range (year 1, pre-Gregorian, year 9999) to NaT, then to the Unix epoch. Replace with np.asarray(..., dtype='datetime64[D]') which covers the full Python date range. Zero NaT positions (INT64_MIN) before multiplying to prevent int64 overflow. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eight new test methods covering gaps exposed by the zero-copy temporal optimizations: null roundtrips, negative timespans, pre-epoch/out-of-range dates (year 1, pre-Gregorian, year 9999), pre-epoch datetimes, time edge cases (midnight, end-of-day, microsecond truncation), all-null temporal columns, and NaT at specific positions in numpy datetime64/timedelta64 arrays. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two new tests targeting the boundscheck=False Cython directives: - test_empty_dataframe: exercises every column type with 0 rows, verifying that zero-iteration export loops don't crash or corrupt memory. - test_multichunk_export: exports 100_001 rows (one more than the default 100_000-row slice size) and checks values at both the first row and the chunk boundary (row 100_000). Covers _export_vt_time's direct [start+i] indexing and _export_get_offset_ptr for the precomputed int64 paths. - test_polars_string_multichunk: same chunk-boundary check for the Polars Arrow buffer path in _export_extract_string_obj_arrow, which does raw C pointer arithmetic into the values buffer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…thout polars/pyarrow" This reverts commit 681a67d.
…extension Compiles sbdf.pyx with -fsanitize=address -fno-omit-frame-pointer and runs the full test suite under LD_PRELOAD=libasan.so with PYTHONMALLOC=malloc. This provides runtime detection of heap buffer overflows that boundscheck=False and the raw C pointer arithmetic in sbdf_helpers.c leave unchecked at the Python level. detect_leaks=0 suppresses intentional Python allocator "leaks". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… 3 chars) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…otlib false positive When using LD_PRELOAD ASan injection with a non-ASan-compiled Python, ASan's __cxa_throw interceptor is never initialized. matplotlib's ft2font.so throws a C++ exception during import, hitting the uninitialized interceptor and causing a CHECK failure. intercept_cxx_exceptions=0 disables the interceptor entirely; sbdf.pyx generates no C++ exceptions so there is no loss of coverage. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…3.13 mypy: pd.array() with list[NaTType] or list[NaT|Timedelta] and a string dtype has no matching overload in pandas-stubs. Add type: ignore[call-overload] on the two affected lines in test_all_null_temporal_columns and test_numpy_timedelta_with_nulls. ASan: Python 3.14 (beta) triggers a CHECK failure in asan_interceptors.cpp when ft2font.so throws a C++ exception, even with intercept_cxx_exceptions=0. Pin the ASan job to Python 3.13 where LD_PRELOAD ASan injection works cleanly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e.js 24; fix line-too-long - ASan job: replace test_requirements_default.txt with html-testRunner + polars + pillow. matplotlib/seaborn/geopandas/shapely use pybind11 C++ extensions that throw exceptions, crashing LD_PRELOAD libasan injection (intercept_cxx_exceptions=0 doesn't help here). pillow is plain C — safe to keep for PIL image export ASan coverage. - Bump GitHub Actions to Node.js 24: checkout v4→v5, setup-python v5→v6, upload-artifact v4→v7, download-artifact v4→v8. - Fix pylint line-too-long (127>120) in test_sbdf.py line 565. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rrency group test_sbdf.py imported geopandas, matplotlib, and seaborn unconditionally, causing ModuleNotFoundError in the ASan CI job where those packages are not installed. Change to try/except with None fallback (matching the polars pattern) and add @unittest.skipIf guards to test_read_write_geodata, test_image_matplot, test_image_seaborn. Also add concurrency group to build.yaml to cancel superseded runs on push. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dule alias Without the explicit import, pylint sees 'matplotlib = None' in the except block as a new constant assignment and flags it as invalid-name (expects UPPER_CASE). Adding 'import matplotlib' before 'import matplotlib.pyplot' matches the same try/except pattern used for polars (import + None fallback). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ce alloc in offset ptr Three C-level optimizations: 1. _export_extract_string_obj / _export_extract_binary_obj: replace per-element PySequence_GetItem calls (Python API dispatch + refcount overhead) with direct pointer arithmetic into numpy array buffers. Callers now pass PyArray_DATA(values_array) as void** and PyArray_DATA(invalid_array) as unsigned char*, eliminating ~2N Python API round-trips per string/binary column. 2. _export_get_offset_ptr: replace the Python slice allocation (array[start:start+count]) with direct byte-offset arithmetic on PyArray_DATA. Avoids a numpy view object allocation on every chunk/column export call. 3. Import string columns: pre-mask the numpy object array before pd.Series() construction instead of assigning None via .loc[] after the fact. The .loc path triggers pandas label-indexing overhead; direct numpy assignment is O(k) with no indexer allocation. Applied only when values.dtype.kind == 'O' to avoid incorrect coercion on bool/float arrays. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tyle violation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2f6ebdf to
7c1ed67
Compare
…tring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…plint line-length rule Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cython + NumPy vectorisation (earlier commits)
C-level pointer optimisations (latest commit)
Benchmark Results (100k rows, Pandas path)
Key wins:
Python 3.13.7 · Pandas 2.3.2 · NumPy 2.3.2 · Windows 11
Test plan
🤖 Generated with Claude Code