Conversation
Five targeted optimisations identified via cProfile on the audio-file benchmark from issue #850: 1. **Etelemetry sentinel** (`job.py`, `submitter.py`): Replace the `None` initial value of `Job._etelemetry_version_data` with a distinct `_ETELEMETRY_UNCHECKED` sentinel. Previously a failed network check returned `None`, keeping the `is None` guard True forever and re-issuing the HTTP request on every task (~93 ms each). 2. **Function bytes cache** (`hash.py`): Add a module-level `_function_bytes_cache` keyed by `(module, qualname, mtime_ns)`. `inspect.getsource()` + `ast.parse()` now runs at most once per function per session; subsequent calls cost only a single `os.stat()`. 3. **Skip hash-change check for non-FileSet tasks** (`job.py`): Call `TypeParser.contains_type(FileSet, ...)` on each input field; if none match, skip the expensive full hash recomputation in `_check_for_hash_changes()`. Scalar/pure-Python values cannot mutate under Pydra. 4. **In-memory result cache** (`job.py`): Store the completed `Result` on `self._cached_result` at the end of `run()` so same-process callers (e.g. `Submitter.__call__` with DebugWorker) do not need to deserialise it back from disk. The field is excluded from `__getstate__` so subprocess/Slurm workers still use the disk path. 5. **Once-per-location PersistentCache.clean_up()** (`hash.py`): Track which cache locations have already been scanned in a class-level set (`_session_cleanups_done`). The O(n) `iterdir()` + `stat()` loop no longer runs after every task. `path.unlink(missing_ok=True)` makes concurrent cleanup by multiple Slurm nodes on shared NFS safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #858 +/- ##
==========================================
+ Coverage 88.26% 88.32% +0.06%
==========================================
Files 88 88
Lines 18198 18234 +36
Branches 3565 3572 +7
==========================================
+ Hits 16062 16105 +43
+ Misses 1751 1743 -8
- Partials 385 386 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The early-return optimization in _check_for_hash_changes() skipped the hash recomputation for tasks with no FileSet inputs, assuming non-FileSet values are always stable. This broke detection of unstable hashes for custom types implementing __bytes_repr__ (HasBytesRepr protocol). Fix by also checking whether any input value implements HasBytesRepr; if so, fall through to the full hash-change check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
effigies
left a comment
There was a problem hiding this comment.
Small comment. Haven't read this all the way through.
| from pydra.utils.etelemetry import check_latest_version | ||
|
|
||
| if Job._etelemetry_version_data is None: | ||
| if Job._etelemetry_version_data is Job._ETELEMETRY_UNCHECKED: |
There was a problem hiding this comment.
All these underscores make me think there should be a method on Job that we call, or these fields should be public. I get it's our code, but we're leaking implementation details.
There was a problem hiding this comment.
Claude finished @satra's task in 1m 29s —— View job
Analysis
Confirmed: the only place where Job private attributes are accessed from outside job.py is submitter.py:118-119, which directly reads/writes Job._etelemetry_version_data and Job._ETELEMETRY_UNCHECKED.
- Gather context: examine
submitter.py:118and the underscore variables accessed - Search codebase for all accesses to private Job attributes from outside
job.py - Determine if a method on Job or public fields would be appropriate
- Implement adjustments
Change Made
Added Job.check_etelemetry() classmethod to job.py that encapsulates the sentinel logic:
@classmethod
def check_etelemetry(cls) -> None:
"""Run the etelemetry version check at most once per session."""
from pydra.utils.etelemetry import check_latest_version
if cls._etelemetry_version_data is cls._ETELEMETRY_UNCHECKED:
cls._etelemetry_version_data = check_latest_version()And simplified submitter.py to just:
Job.check_etelemetry()This keeps _etelemetry_version_data and _ETELEMETRY_UNCHECKED as private implementation details of Job, with Submitter calling a clean public API. Pushed as commit f95951e.
…try() Move the private _etelemetry_version_data / _ETELEMETRY_UNCHECKED access from submitter.py into a new Job.check_etelemetry() classmethod, so the sentinel pattern stays an implementation detail of Job and Submitter only calls a clean public API. Co-authored-by: Satrajit Ghosh <satra@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude - check on the failed claude-review pull request |
|
Claude finished @satra's task in 38s —— View job Failed Claude Code Review — Root Cause Found
Root CauseThe The previous Claude review comment triggered a new FixThe - uses: anthropics/claude-code-action@...
with:
allowed_bots: "claude" # or '*' to allow all bots
...
|
| @classmethod | ||
| def check_etelemetry(cls) -> None: | ||
| """Run the etelemetry version check at most once per session.""" | ||
| from pydra.utils.etelemetry import check_latest_version |
There was a problem hiding this comment.
I would just import this at the top of the module. There's no circular dependency here to justify a deferred import.
There was a problem hiding this comment.
| # Sentinel meaning "check not yet performed". Distinct from None so that a | ||
| # failed/empty etelemetry response (which returns None) does not cause the | ||
| # network check to be repeated on every subsequent task invocation. | ||
| _ETELEMETRY_UNCHECKED = object() | ||
| _etelemetry_version_data = _ETELEMETRY_UNCHECKED # class variable |
There was a problem hiding this comment.
Looking at this again, a sentinel and an optional data object seem like overkill. We never look at the contents of _etelemetry_version_data. Is there a plan to expose it, or should we just use a boolean flag?
class Job:
_etelemetry_checked: ty.ClassVar[bool] = False
@classmethod
def check_etelemetry(cls) -> None:
if not cls._etelemetry_checked:
cls._etelemetry_checked = True
check_latest_version()There was a problem hiding this comment.
usually never expect the etelemetry client to use version info, since it's own version is being used. that check happens inside etelemetry.
@claude - can you simplify this to a boolean check as noted.
There was a problem hiding this comment.
Claude finished @satra's task in 57s —— View job
- Gather context
- Read job.py to find the sentinel/etelemetry code
- Simplify to boolean flag
- Commit and push
Done. Replaced the sentinel object pattern with a simple bool class variable as suggested:
class Job:
_etelemetry_checked: ty.ClassVar[bool] = False
@classmethod
def check_etelemetry(cls) -> None:
"""Run the etelemetry version check at most once per session."""
if not cls._etelemetry_checked:
cls._etelemetry_checked = True
check_latest_version()The _ETELEMETRY_UNCHECKED sentinel and _etelemetry_version_data storage are both removed since the return value from check_latest_version() is never used by Pydra. Pushed as commit 4859b70.
Co-authored-by: Satrajit Ghosh <satra@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the sentinel object pattern with a plain boolean class variable, as the version data returned by check_latest_version() is never used by Pydra itself. Co-authored-by: Satrajit Ghosh <satra@users.noreply.github.com>
effigies
left a comment
There was a problem hiding this comment.
Two nits, neither impacting the logic here.
This all makes sense to me.
Co-authored-by: Chris Markiewicz <markiewicz@stanford.edu>
for more information, see https://pre-commit.ci
|
i'll wait till i have had a chance to actually test this on the old senselab branch that used pydra where we first noticed this happening. i still have a feeling there is something else to it, but who knows. |
closes #850
Five targeted optimisations identified via cProfile on the audio-file benchmark from issue #850:
Etelemetry sentinel (
job.py,submitter.py): Replace theNoneinitial value ofJob._etelemetry_version_datawith a distinct_ETELEMETRY_UNCHECKEDsentinel. Previously a failed network check returnedNone, keeping theis Noneguard True forever and re-issuing the HTTP request on every task (~93 ms each).Function bytes cache (
hash.py): Add a module-level_function_bytes_cachekeyed by(module, qualname, mtime_ns).inspect.getsource()+ast.parse()now runs at most once per function per session; subsequent calls cost only a singleos.stat().Skip hash-change check for non-FileSet tasks (
job.py): CallTypeParser.contains_type(FileSet, ...)on each input field; if none match, skip the expensive full hash recomputation in_check_for_hash_changes(). Scalar/pure-Python values cannot mutate under Pydra.In-memory result cache (
job.py): Store the completedResultonself._cached_resultat the end ofrun()so same-process callers (e.g.Submitter.__call__with DebugWorker) do not need to deserialise it back from disk. The field is excluded from__getstate__so subprocess/Slurm workers still use the disk path.Once-per-location PersistentCache.clean_up() (
hash.py): Track which cache locations have already been scanned in a class-level set (_session_cleanups_done). The O(n)iterdir()+stat()loop no longer runs after every task.path.unlink(missing_ok=True)makes concurrent cleanup by multiple Slurm nodes on shared NFS safe.Types of changes
Summary
Checklist