Context
The resample refactor (#428, commit d7e2446) removed the _ResampledTSDF class and instead stored resample metadata (resample_freq, resample_func) as mutable attributes on the regular TSDF class. This creates a corruption vector: any TSDF operation that chains after resample() propagates the metadata blindly through __withTransformedDF(), meaning operations like .filter(), .union(), or .withColumn() can silently invalidate the resample context while still carrying the metadata forward to .interpolate().
A proposal for a proper ResampledTSDF intermediate object was written (commit 437e3b1) but never implemented. This issue tracks restoring that pattern.
The Problem
Current state on v0.2-integration:
# TSDF.__init__() accepts resample metadata (tsdf.py:127-128)
resample_freq: Optional[str] = None,
resample_func: Optional[Union[Callable, str]] = None,
# resample() returns a regular TSDF with metadata attached (resample.py:452-457)
return TSDF(
enriched_df,
...
resample_freq=freq,
resample_func=func,
)
# __withTransformedDF() blindly propagates metadata to all derived TSDFs (tsdf.py:163-164)
resample_freq=self.resample_freq,
resample_func=self.resample_func,
This means the following produces silently wrong results:
# Metadata propagates through filter — interpolate trusts stale metadata
tsdf.resample(freq="min", func="mean").filter(...).interpolate(method="linear")
Proposed Solution
Introduce a ResampledTSDF class that acts as a restricted intermediate object, following the same pattern as Apache Spark's GroupedData:
| Spark Pattern |
Tempo Pattern |
df.groupBy("key") → GroupedData |
tsdf.resample(freq, func) → ResampledTSDF |
GroupedData.agg(...) → DataFrame |
ResampledTSDF.interpolate(...) → TSDF |
GroupedData.filter(...) → AttributeError |
ResampledTSDF.filter(...) → AttributeError |
Key Changes
- Create
ResampledTSDF class — restricted wrapper exposing only valid post-resample operations (interpolate(), as_tsdf(), show())
- Update
TSDF.resample() — return ResampledTSDF instead of TSDF
- Remove
resample_freq/resample_func from TSDF.__init__() — metadata lives only on ResampledTSDF, never on TSDF
- Remove metadata propagation from
__withTransformedDF() — no more stale state
Valid Usage
# Chain resample → interpolate (primary use case)
result = tsdf.resample(freq="min", func="mean").interpolate(method="linear")
# Get resampled data without interpolation
resampled = tsdf.resample(freq="min", func="mean").as_tsdf()
# Inspect before interpolating
resampled = tsdf.resample(freq="min", func="mean")
resampled.show()
result = resampled.interpolate(method="linear")
Invalid Usage (Now Prevented)
# AttributeError — operations not available on ResampledTSDF
tsdf.resample(freq="min", func="mean").filter(...)
tsdf.resample(freq="min", func="mean").withColumn(...)
# If you need those operations, finalize first (explicit opt-out of safety)
tsdf.resample(freq="min", func="mean").as_tsdf().filter(...)
Why This Matters
- Prevents silent data corruption — invalid operation chains fail loudly instead of producing wrong results
- Type safety — IDE autocompletion only shows valid operations after
resample()
- Self-documenting — the class name and restricted API indicate the expected workflow
- Precedent — this is exactly how Spark handles
GroupedData and for the same reasons
Git History Reference
| Commit |
Description |
437e3b1 |
Proposal document for ResampledTSDF intermediate object pattern |
d7e2446 |
Resample refactor (#428) — removed _ResampledTSDF, added metadata attrs to TSDF |
ec4fe38 |
Original refactor that removed _ResampledTSDF class |
Implementation Checklist
Related
- Proposal doc: commit
437e3b1
Context
The resample refactor (#428, commit d7e2446) removed the
_ResampledTSDFclass and instead stored resample metadata (resample_freq,resample_func) as mutable attributes on the regularTSDFclass. This creates a corruption vector: any TSDF operation that chains afterresample()propagates the metadata blindly through__withTransformedDF(), meaning operations like.filter(),.union(), or.withColumn()can silently invalidate the resample context while still carrying the metadata forward to.interpolate().A proposal for a proper
ResampledTSDFintermediate object was written (commit 437e3b1) but never implemented. This issue tracks restoring that pattern.The Problem
Current state on
v0.2-integration:This means the following produces silently wrong results:
Proposed Solution
Introduce a
ResampledTSDFclass that acts as a restricted intermediate object, following the same pattern as Apache Spark'sGroupedData:df.groupBy("key")→GroupedDatatsdf.resample(freq, func)→ResampledTSDFGroupedData.agg(...)→DataFrameResampledTSDF.interpolate(...)→TSDFGroupedData.filter(...)→AttributeErrorResampledTSDF.filter(...)→AttributeErrorKey Changes
ResampledTSDFclass — restricted wrapper exposing only valid post-resample operations (interpolate(),as_tsdf(),show())TSDF.resample()— returnResampledTSDFinstead ofTSDFresample_freq/resample_funcfromTSDF.__init__()— metadata lives only onResampledTSDF, never onTSDF__withTransformedDF()— no more stale stateValid Usage
Invalid Usage (Now Prevented)
Why This Matters
resample()GroupedDataand for the same reasonsGit History Reference
437e3b1ResampledTSDFintermediate object patternd7e2446_ResampledTSDF, added metadata attrs toTSDFec4fe38_ResampledTSDFclassImplementation Checklist
ResampledTSDFclass (intempo/resampled.pyortempo/resample.py)TSDF.resample()return type toResampledTSDFresample_freq/resample_funcfromTSDF.__init__()and__withTransformedDF()TSDF.interpolate()to require explicitfreq/funcargs (called internally byResampledTSDF.interpolate())ResampledTSDF(valid chains, invalid chains,as_tsdf()escape hatch)Related
437e3b1