Add hardware-to-simulator metric comparison tools and refactor code#161
Open
athebolt wants to merge 16 commits into
Open
Add hardware-to-simulator metric comparison tools and refactor code#161athebolt wants to merge 16 commits into
athebolt wants to merge 16 commits into
Conversation
… rows in the CSV 2. double_buffered_scratchpad_mem.py — np.max(a, b, c) → max(a, b, c) (numpy API change)
…able run_comparison(ncu_path, scalesim_dir, report_path) function
…n Nano GPU running an inference with the ResNet18 model
…ions of the ResNet 18 model
… SCALE-Sim outputs and writes a comparison report which I also attached
…tion for Orin Nano
… tensor cores by individual sm for accuracy
There was a problem hiding this comment.
Pull request overview
This PR adds a workflow to compare NVIDIA Nsight Compute (NCU) profiling metrics against SCALE-Sim outputs, and extends topology/config handling to support per-layer “compute target” (GPU vs NPU) plus a GPU tensor-core parallelism parameter.
Changes:
- Extend CONV topology parsing to accept optional per-layer
Target(GPU/NPU) and expose it viaget_layer_target(). - Add NCU CSV ingestion and report overriding logic in the simulator, plus a CLI flag (
-m) to generate a post-run comparison report. - Add a Jetson Orin Nano example config and small fixes in memory-cycle max computations and
.gitignore.
Reviewed changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| topologies/conv_nets/Resnet18.csv | Adds a per-layer target column (GPU/NPU) for ResNet-18. |
| scalesim/topology_utils.py | Makes CONV topology parsing more flexible; appends target and adds get_layer_target(). |
| scalesim/simulator.py | Adds NCU CSV parsing and optionally overrides per-layer report metrics for GPU-target layers. |
| scalesim/scale.py | Adds -m CLI flag and runs the hardware-vs-sim comparison report after simulation. |
| scalesim/scale_sim.py | Plumbs ncu_metrics through to the simulator. |
| scalesim/scale_config.py | Adds optional [gpu] TensorCores config and accessor. |
| scalesim/memory/read_buffer.py | Uses np.max for correctness on numpy arrays. |
| scalesim/memory/double_buffered_scratchpad_mem.py | Uses np.max for correctness on numpy arrays. |
| scalesim/compare_metrics.py | New module to parse NCU + SCALE-Sim reports and write a comparison report. |
| configs/orin_nano.cfg | New example config including [gpu] TensorCores. |
| .gitignore | Ignores .venv and /outputs/; normalizes results/. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+614
to
616
| return self.topo_arrays[layer_id][11] | ||
|
|
||
|
|
Comment on lines
+85
to
+89
|
|
||
| pivot = df.pivot_table( | ||
| index=["ID", "Kernel Name"], | ||
| columns="Metric Name", | ||
| values="Metric Value", |
Comment on lines
+109
to
+118
| for _, row in pivot.iterrows(): | ||
| if row["is_conv"]: | ||
| if current_layer is not None: | ||
| layer_buckets.append(current_layer) | ||
| current_layer = { | ||
| 'cycles': row['sm__cycles_active.avg'], | ||
| 'sram_reads': (row.get('l1tex__t_sectors.sum', 0) + row.get('lts__t_sectors.sum', 0)) * 32, | ||
| 'dram_reads': row.get('dram__bytes_read.sum', 0), | ||
| 'dram_writes': row.get('dram__bytes_write.sum', 0) | ||
| } |
| # Override Compute items: Total, Compute, Stall | ||
| compute_report_items_this_layer[0] = cycles | ||
| compute_report_items_this_layer[1] = cycles | ||
| compute_report_items_this_layer[2] = 0 |
Comment on lines
+109
to
+118
| # Orin Nano has 8 SMs. hmma.sum is aggregate, cycles.avg is per-SM. | ||
| # We divide by 8 to get the per-SM tensor contribution for a fair latency comparison. | ||
| conv_ten = sum(d["tensor_cycles"] for d in grouped_data) / 8.0 | ||
|
|
||
| return { | ||
| "total_kernels": len(pivot), | ||
| "conv_kernels": len(grouped_data), | ||
| "total_cycles": pivot["sm__cycles_active.avg"].sum() if "sm__cycles_active.avg" in pivot.columns else 0, | ||
| "tensor_cycles": (pivot["sm__pipe_tensor_op_hmma_cycles_active.sum"].sum() / 8.0) if "sm__pipe_tensor_op_hmma_cycles_active.sum" in pivot.columns else 0, | ||
| "l1_sectors": pivot["l1tex__t_sectors.sum"].sum() if "l1tex__t_sectors.sum" in pivot.columns else 0, |
| # sim_dram = sim.get("dram_reads", 0) + sim.get("dram_writes", 0) | ||
| # rows.append(("Off-chip (DRAM) Access (Bytes)", hw_dram, sim_dram)) | ||
|
|
||
| rows.append(("Tensor Compute Cycles", |
| Conv5_2a,7,7,3,3,512,512,1, | ||
| Conv5_2b,7,7,3,3,512,512,1, | ||
| FC,1,1,1,1,512,1000,1, No newline at end of file | ||
| Layer name, IFMAP Height, IFMAP Width, Filter Height, Filter Width, Channels, Num Filter, Strides,, Target |
Comment on lines
+161
to
+168
| elems = elems[0:9] | ||
| if ':' in sparsity_field and sparsity_field != '': | ||
| sparsity_ratio = sparsity_field.split(':') | ||
| else: | ||
| sparsity_ratio = ["1", "1"] | ||
| elems.append(sparsity_ratio[0]) | ||
| elems.append(sparsity_ratio[1]) | ||
| elems.append(target) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.