Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded a TrajectoryEntry dataclass and per-iteration in-memory + CSV trajectory cache to CloudAIGymEnv; step now checks the cache for exact-action matches and write_trajectory accepts a TrajectoryEntry. Related imports, exports, SLURM integration, and tests updated to the new API. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 120-134: The cache-hit branch in cloudai_gym.py uses
cached_result["source_step"] when calling write_trajectory, but for executed
entries the original executing step is stored as cached_result["step"], so
update the call in the cache path to pass cached_result["step"] as source_step;
locate the logic around get_cached_trajectory_result(...) and the
write_trajectory(...) invocation in the same block and replace the source_step
argument accordingly so cached entries correctly record the original executed
step.
In `@tests/test_cloudaigym.py`:
- Around line 304-323: The parametrized test data for the "trajectory" parameter
is missing the required TypedDict fields "step" and "source_step" from
TrajectoryEntry; update each trajectory dict in the pytest.mark.parametrize
block (the entries paired with variables "trajectory", "action", "expected") to
include valid "step" and "source_step" keys (use appropriate example values
consistent with other fields, e.g., integer step indices and a matching
source_step) so the test data matches the TrajectoryEntry TypedDict and avoids
type inconsistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9fc1dc33-ba1a-4ca7-a34d-d78581dda9e9
📒 Files selected for processing (8)
src/cloudai/configurator/cloudai_gym.pysrc/cloudai/reporter.pysrc/cloudai/systems/slurm/single_sbatch_runner.pytests/conftest.pytests/test_cloudaigym.pytests/test_handlers.pytests/test_reporter.pytests/test_single_sbatch_runner.py
💤 Files with no reviewable changes (1)
- tests/test_reporter.py
Greptile SummaryThis PR implements an in-memory trajectory cache for Key changes:
The cache is in-memory only and is not restored from the existing trajectory CSV on disk, so a restarted process will re-execute all steps. No issues were found with the cache lookup ordering or step-counter advancement logic. Confidence Score: 4/5
Important Files Changed
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 263-267: The CSV writing is fragile because the header uses
TrajectoryEntry.__annotations__.keys() while the row uses entry.values(),
relying on dict ordering; change the write logic to compute the header_keys =
list(TrajectoryEntry.__annotations__.keys()) and always write rows by iterating
those keys (e.g., writer.writerow([entry.get(k, "") for k in header_keys]))
instead of writer.writerow(list(entry.values())), ensuring consistent column
order when writing in the method that opens self.trajectory_file_path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dbd29a05-b26c-486e-9078-e2e76c5a1963
📒 Files selected for processing (2)
src/cloudai/configurator/cloudai_gym.pytests/test_handlers.py
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Line 60: The in-memory cache self.trajectory is global across the object while
trajectory_file_path is per-iteration (keyed by
self.test_run.current_iteration), allowing cross-iteration reuse and mixing of
CSVs; update get_cached_trajectory_result and related logic to scope cached
entries by iteration (e.g., map from current_iteration -> list[TrajectoryEntry])
or clear/replace self.trajectory at iteration boundary in reset(), and ensure
append logic that writes to trajectory_file_path uses the iteration-scoped
cache; also apply the same scoping/clearing changes to the code handling lines
around the other occurrence (the 233-242 block) so no stale entries are reused
across iterations.
- Around line 120-126: The cached TrajectoryEntry exposes mutable fields
(action, observation) that callers may mutate; update
get_cached_trajectory_result usage in the code path that returns cached_result
to return a deep copy of cached_result.observation (use copy.deepcopy) and
similarly modify write_trajectory to deepcopy the entry.action and
entry.observation before storing into self.trajectory so stored objects are
immutable from callers; reference the TrajectoryEntry type, the
get_cached_trajectory_result return site, and the write_trajectory storage logic
and import/use copy.deepcopy accordingly.
In `@src/cloudai/systems/slurm/single_sbatch_runner.py`:
- Around line 215-222: handle_dse() is writing trajectory rows for every
combination in tr.all_combinations even if they were filtered out by
unroll_dse(); update the loop that calls
gym.write_trajectory(TrajectoryEntry(...)) to skip combinations that fail the
same constraint_check() used in unroll_dse() (or check membership against the
filtered list produced by unroll_dse()), so only combinations that actually ran
produce observation/reward rows; reference gym.write_trajectory,
TrajectoryEntry, unroll_dse(), handle_dse(), tr.all_combinations, and
constraint_check() when locating and applying the guard.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a3de5944-b2fd-465c-9fdd-94ff858e26ab
📒 Files selected for processing (5)
src/cloudai/configurator/__init__.pysrc/cloudai/configurator/cloudai_gym.pysrc/cloudai/systems/slurm/single_sbatch_runner.pytests/test_cloudaigym.pytests/test_handlers.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/cloudai/configurator/cloudai_gym.py`:
- Around line 248-272: The method _values_match_exact has unnecessary elifs
after earlier returns; simplify control flow by replacing those elif branches
with plain if statements (keep the same type checks and logic for dict handling,
list/tuple handling—including the zip(..., strict=True) loop—and the final
equality fallback) so behavior does not change but the function flow is cleaner
and satisfies the RET505 hint.
- Around line 30-38: The TrajectoryEntry dataclass's observation field is
annotated as a bare list; change it to a parametrized type (e.g., list[Any] or
Sequence[Any]) to improve typing and IDE support: update the annotation for
observation in TrajectoryEntry and add/import the appropriate typing symbol (Any
or Sequence) from typing at the top of the module so the code compiles and
type-checks correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0b13d3b8-abbd-4615-af08-c4c510ee71e3
📒 Files selected for processing (2)
src/cloudai/configurator/cloudai_gym.pytests/test_cloudaigym.py
Summary
Implement agents caching to not repeat the same sweeps
Test Plan
conf/common/test_scenario/dse_nccl_all_gather.tomlonEOSAdditional Notes
Closes CLOUDAI-25