Add hardware-to-simulator metric comparison tools and refactor code by athebolt · Pull Request #161 · scalesim-project/SCALE-Sim

athebolt · 2026-04-29T20:03:06Z

No description provided.

… rows in the CSV 2. double_buffered_scratchpad_mem.py — np.max(a, b, c) → max(a, b, c) (numpy API change)

…able run_comparison(ncu_path, scalesim_dir, report_path) function

…n Nano GPU running an inference with the ResNet18 model

…ions of the ResNet 18 model

…18 model

… SCALE-Sim outputs and writes a comparison report which I also attached

…tion for Orin Nano

… tensor cores by individual sm for accuracy

Copilot

Pull request overview

This PR adds a workflow to compare NVIDIA Nsight Compute (NCU) profiling metrics against SCALE-Sim outputs, and extends topology/config handling to support per-layer “compute target” (GPU vs NPU) plus a GPU tensor-core parallelism parameter.

Changes:

Extend CONV topology parsing to accept optional per-layer Target (GPU/NPU) and expose it via get_layer_target().
Add NCU CSV ingestion and report overriding logic in the simulator, plus a CLI flag (-m) to generate a post-run comparison report.
Add a Jetson Orin Nano example config and small fixes in memory-cycle max computations and .gitignore.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
topologies/conv_nets/Resnet18.csv	Adds a per-layer target column (GPU/NPU) for ResNet-18.
scalesim/topology_utils.py	Makes CONV topology parsing more flexible; appends target and adds `get_layer_target()`.
scalesim/simulator.py	Adds NCU CSV parsing and optionally overrides per-layer report metrics for GPU-target layers.
scalesim/scale.py	Adds `-m` CLI flag and runs the hardware-vs-sim comparison report after simulation.
scalesim/scale_sim.py	Plumbs `ncu_metrics` through to the simulator.
scalesim/scale_config.py	Adds optional `[gpu] TensorCores` config and accessor.
scalesim/memory/read_buffer.py	Uses `np.max` for correctness on numpy arrays.
scalesim/memory/double_buffered_scratchpad_mem.py	Uses `np.max` for correctness on numpy arrays.
scalesim/compare_metrics.py	New module to parse NCU + SCALE-Sim reports and write a comparison report.
configs/orin_nano.cfg	New example config including `[gpu] TensorCores`.
.gitignore	Ignores `.venv` and `/outputs/`; normalizes `results/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        return self.topo_arrays[layer_id][11]
+



+
+        pivot = df.pivot_table(
+            index=["ID", "Kernel Name"],
+            columns="Metric Name",
+            values="Metric Value",


+        for _, row in pivot.iterrows():
+            if row["is_conv"]:
+                if current_layer is not None:
+                    layer_buckets.append(current_layer)
+                current_layer = {
+                    'cycles': row['sm__cycles_active.avg'],
+                    'sram_reads': (row.get('l1tex__t_sectors.sum', 0) + row.get('lts__t_sectors.sum', 0)) * 32,
+                    'dram_reads': row.get('dram__bytes_read.sum', 0),
+                    'dram_writes': row.get('dram__bytes_write.sum', 0)
+                }


+                # Override Compute items: Total, Compute, Stall
+                compute_report_items_this_layer[0] = cycles
+                compute_report_items_this_layer[1] = cycles
+                compute_report_items_this_layer[2] = 0


+    # Orin Nano has 8 SMs. hmma.sum is aggregate, cycles.avg is per-SM.
+    # We divide by 8 to get the per-SM tensor contribution for a fair latency comparison.
+    conv_ten = sum(d["tensor_cycles"] for d in grouped_data) / 8.0
+
+    return {
+        "total_kernels":     len(pivot),
+        "conv_kernels":      len(grouped_data),
+        "total_cycles":      pivot["sm__cycles_active.avg"].sum() if "sm__cycles_active.avg" in pivot.columns else 0,
+        "tensor_cycles":     (pivot["sm__pipe_tensor_op_hmma_cycles_active.sum"].sum() / 8.0) if "sm__pipe_tensor_op_hmma_cycles_active.sum" in pivot.columns else 0,
+        "l1_sectors":        pivot["l1tex__t_sectors.sum"].sum() if "l1tex__t_sectors.sum" in pivot.columns else 0,


+        # sim_dram = sim.get("dram_reads", 0) + sim.get("dram_writes", 0)
+        # rows.append(("Off-chip (DRAM) Access (Bytes)", hw_dram, sim_dram))
+
+        rows.append(("Tensor Compute Cycles",


-Conv5_2a,7,7,3,3,512,512,1,
-Conv5_2b,7,7,3,3,512,512,1,
-FC,1,1,1,1,512,1000,1,
+Layer name, IFMAP Height, IFMAP Width, Filter Height, Filter Width, Channels, Num Filter, Strides,, Target


+                elems = elems[0:9]
+                if ':' in sparsity_field and sparsity_field != '':
+                    sparsity_ratio = sparsity_field.split(':')
+                else:
+                    sparsity_ratio = ["1", "1"]
                elems.append(sparsity_ratio[0])
                elems.append(sparsity_ratio[1])
+                elems.append(target)


athebolt and others added 16 commits April 8, 2026 22:43

initial commit

ac18ee8

1. topology_utils.py — parser was crashing on extra columns and empty…

8912218

… rows in the CSV 2. double_buffered_scratchpad_mem.py — np.max(a, b, c) → max(a, b, c) (numpy API change)

add outputs/ to gitignore

394ff9c

Create scalesim/compare_metrics.py — a refactored version with a call…

1510601

…able run_comparison(ncu_path, scalesim_dir, report_path) function

Add -m arg to scale.py and call comparison after run_scale()

142e4ef

ncu hardware metrics .csv is the performance of the actual Jetson Ori…

db19b39

…n Nano GPU running an inference with the ResNet18 model

network layout and network topology are csv files with the specificat…

5610d34

…ions of the ResNet 18 model

network topology are csv files with the specifications of the ResNet …

49c555d

…18 model

the compare metrics python script parses the hardware metrics and the…

b53bd2a

… SCALE-Sim outputs and writes a comparison report which I also attached

Refactor code structure for improved readability and maintainability

b16f4a4

feat: add hardware-to-simulator metric comparison tools and configura…

094292d

…tion for Orin Nano

attempted to simulate hardware compute cycles

5bb237a

fixed compare metrics to return more comparable data

6757cf7

accurate total compute cycles

cb163ac

compare-metrics stuff

ccf56be

fixed up comparison reports, readjusted gpu (ncu) vs npu metrics, div…

ef8992e

… tensor cores by individual sm for accuracy

ritikraj7 requested a review from Copilot May 8, 2026 20:27

Copilot started reviewing on behalf of ritikraj7 May 8, 2026 20:27 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hardware-to-simulator metric comparison tools and refactor code#161

Add hardware-to-simulator metric comparison tools and refactor code#161
athebolt wants to merge 16 commits into
scalesim-project:mainfrom
athebolt:main

athebolt commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

athebolt commented Apr 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants