AORTA-23 : Phase 1 : Json loader for all configuration.#98
AORTA-23 : Phase 1 : Json loader for all configuration.#98prosenjitdhole wants to merge 12 commits intoprosenj_cli_hq_eval_reportfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a HW Queue Eval JSON loader and related data classes to the report processing layer, enabling parsing of single-run, sweep, and directory-based multi-workload result outputs.
Changes:
- Introduces
hwqueue_loader.pywith dataclasses for HW Queue Eval results and a loader API for single files, sweeps, and directories. - Exposes the new loader and dataclasses via
aorta.report.processingpackage exports.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
src/aorta/report/processing/hwqueue_loader.py |
New dataclasses + JSON loading utilities for HW Queue Eval results (single/sweep/directory + comparison helpers). |
src/aorta/report/processing/__init__.py |
Re-exports HW Queue Eval loader/types from the processing package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@prosenjitdhole I've opened a new pull request, #120, to work on those changes. Once the pull request is ready, I'll request review from you. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… JSON (#120) * Initial plan * Add isinstance(data, dict) guard to _is_single_run_format and _is_sweep_format Co-authored-by: prosenjitdhole <239307697+prosenjitdhole@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: prosenjitdhole <239307697+prosenjitdhole@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@prosenjitdhole I've opened a new pull request, #123, to work on those changes. Once the pull request is ready, I'll request review from you. |
|
@prosenjitdhole I've opened a new pull request, #124, to work on those changes. Once the pull request is ready, I'll request review from you. |
* Initial plan * Add TODO for optimizing load_comparison_data to avoid unnecessary I/O Co-authored-by: prosenjitdhole <239307697+prosenjitdhole@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: prosenjitdhole <239307697+prosenjitdhole@users.noreply.github.com>
|
@prosenjitdhole I've opened a new pull request, #125, to work on those changes. Once the pull request is ready, I'll request review from you. |
…json` in `load_directory()` (#125) * Initial plan * Load environment_info.json in load_directory() for single-run wrappers Co-authored-by: prosenjitdhole <239307697+prosenjitdhole@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: prosenjitdhole <239307697+prosenjitdhole@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _load_json(path: Path) -> Dict[str, Any]: | ||
| """Load JSON file with error handling.""" | ||
| if not path.exists(): | ||
| raise HWQueueLoaderError(f"File not found: {path}") | ||
| if not path.is_file(): | ||
| raise HWQueueLoaderError(f"Not a file: {path}") | ||
|
|
||
| try: | ||
| with open(path, "r") as f: | ||
| return json.load(f) | ||
| except (json.JSONDecodeError, OSError, UnicodeDecodeError) as e: | ||
| raise HWQueueLoaderError(f"Failed to load JSON from {path}: {e}") from e |
There was a problem hiding this comment.
_load_json() is annotated to return Dict[str, Any], but json.load() is typed as returning Any. With the repo's mypy settings (warn_return_any = true), this will be flagged as returning Any from a function declared to return a Dict. Consider loading into a local var, validating it's a dict (or raising HWQueueLoaderError if not), and then returning a cast(Dict[str, Any], data) (or loosen the return type to Any).
| results = [SingleRunData.from_dict(r) for r in data.get("results", [])] | ||
| return cls( | ||
| workload_name=data.get("workload", "unknown"), | ||
| results=results, | ||
| environment=EnvironmentData.from_dict(data.get("environment")), | ||
| analysis=ScalingAnalysisData.from_dict(data.get("analysis")), |
There was a problem hiding this comment.
SweepData.from_dict() assumes every entry in data["results"] is a mapping; if an entry is not a dict, SingleRunData.from_dict(r) will raise (e.g., AttributeError on .get) and bypass HWQueueLoaderError handling in load_sweep() / load_auto(). Consider validating each element is a dict (and possibly required keys exist) and raising HWQueueLoaderError with a clear schema error message.
| results = [SingleRunData.from_dict(r) for r in data.get("results", [])] | |
| return cls( | |
| workload_name=data.get("workload", "unknown"), | |
| results=results, | |
| environment=EnvironmentData.from_dict(data.get("environment")), | |
| analysis=ScalingAnalysisData.from_dict(data.get("analysis")), | |
| raw_results = data.get("results", []) | |
| if not isinstance(raw_results, list): | |
| raise HWQueueLoaderError( | |
| "Invalid sweep JSON: 'results' must be a list of objects." | |
| ) | |
| results: List[SingleRunData] = [] | |
| for idx, r in enumerate(raw_results): | |
| if not isinstance(r, dict): | |
| raise HWQueueLoaderError( | |
| "Invalid sweep JSON: each entry in 'results' must be an object; " | |
| f"found {type(r).__name__} at index {idx}." | |
| ) | |
| results.append(SingleRunData.from_dict(r)) | |
| env_data = data.get("environment") | |
| if env_data is not None and not isinstance(env_data, dict): | |
| raise HWQueueLoaderError( | |
| "Invalid sweep JSON: 'environment' must be an object when present." | |
| ) | |
| analysis_data = data.get("analysis") | |
| if analysis_data is not None and not isinstance(analysis_data, dict): | |
| raise HWQueueLoaderError( | |
| "Invalid sweep JSON: 'analysis' must be an object when present." | |
| ) | |
| return cls( | |
| workload_name=data.get("workload", "unknown"), | |
| results=results, | |
| environment=EnvironmentData.from_dict(env_data), | |
| analysis=ScalingAnalysisData.from_dict(analysis_data), |
| gpu_count: int = 0 | ||
| gpus: List[Dict[str, Any]] = field(default_factory=list) | ||
|
|
There was a problem hiding this comment.
EnvironmentData.gpus is typed as List[str], but the producer (aorta.utils.device.get_system_info() / log_environment_info()) records GPUs as a list of dicts (e.g., {index, name, memory_gb}), and save_sweep_results() persists that structure. This mismatch makes the loader's type hints misleading; consider changing gpus to List[Dict[str, Any]] (or introducing a dedicated GPU dataclass) and updating the factories accordingly.
| for json_file in json_files: | ||
| try: | ||
| # Extract workload name from filename (e.g., hetero_kernels_results.json -> hetero_kernels) | ||
| workload_name = json_file.stem.replace("_results", "") |
There was a problem hiding this comment.
Workload name extraction uses json_file.stem.replace("_results", ""), which removes all occurrences of _results in the stem. For filenames like my_results_test_results.json this would produce my_test instead of my_results_test. Consider using a suffix-based removal (e.g., removesuffix("_results") with an endswith guard) to only strip the trailing marker.
| workload_name = json_file.stem.replace("_results", "") | |
| workload_name = json_file.stem | |
| if workload_name.endswith("_results"): | |
| workload_name = workload_name[: -len("_results")] |
| raise HWQueueLoaderError(f"Invalid directory: {dir_path}") | ||
| workloads = set() | ||
| for f in dir_path.glob("*_results.json"): | ||
| workload_name = f.stem.replace("_results", "") |
There was a problem hiding this comment.
Same issue as in load_directory(): f.stem.replace("_results", "") will remove all _results occurrences, not just a trailing suffix, which can miscompute workload names for stems that legitimately contain _results earlier. Prefer a suffix-only removal (e.g., removesuffix("_results")).
| workload_name = f.stem.replace("_results", "") | |
| workload_name = f.stem.removesuffix("_results") |
Implement the JSON loader for all configurations i.e single run, sweep run, and multiple workload run