Skip to content

Add hardware-to-simulator metric comparison tools and refactor code#161

Open
athebolt wants to merge 16 commits into
scalesim-project:mainfrom
athebolt:main
Open

Add hardware-to-simulator metric comparison tools and refactor code#161
athebolt wants to merge 16 commits into
scalesim-project:mainfrom
athebolt:main

Conversation

@athebolt
Copy link
Copy Markdown

No description provided.

athebolt and others added 16 commits April 8, 2026 22:43
… rows in

  the CSV
  2. double_buffered_scratchpad_mem.py — np.max(a, b, c) → max(a, b, c) (numpy
  API change)
…able

  run_comparison(ncu_path, scalesim_dir, report_path) function
…n Nano GPU running an inference with the ResNet18 model
… SCALE-Sim outputs and writes a comparison report which I also attached
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a workflow to compare NVIDIA Nsight Compute (NCU) profiling metrics against SCALE-Sim outputs, and extends topology/config handling to support per-layer “compute target” (GPU vs NPU) plus a GPU tensor-core parallelism parameter.

Changes:

  • Extend CONV topology parsing to accept optional per-layer Target (GPU/NPU) and expose it via get_layer_target().
  • Add NCU CSV ingestion and report overriding logic in the simulator, plus a CLI flag (-m) to generate a post-run comparison report.
  • Add a Jetson Orin Nano example config and small fixes in memory-cycle max computations and .gitignore.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
topologies/conv_nets/Resnet18.csv Adds a per-layer target column (GPU/NPU) for ResNet-18.
scalesim/topology_utils.py Makes CONV topology parsing more flexible; appends target and adds get_layer_target().
scalesim/simulator.py Adds NCU CSV parsing and optionally overrides per-layer report metrics for GPU-target layers.
scalesim/scale.py Adds -m CLI flag and runs the hardware-vs-sim comparison report after simulation.
scalesim/scale_sim.py Plumbs ncu_metrics through to the simulator.
scalesim/scale_config.py Adds optional [gpu] TensorCores config and accessor.
scalesim/memory/read_buffer.py Uses np.max for correctness on numpy arrays.
scalesim/memory/double_buffered_scratchpad_mem.py Uses np.max for correctness on numpy arrays.
scalesim/compare_metrics.py New module to parse NCU + SCALE-Sim reports and write a comparison report.
configs/orin_nano.cfg New example config including [gpu] TensorCores.
.gitignore Ignores .venv and /outputs/; normalizes results/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +614 to 616
return self.topo_arrays[layer_id][11]


Comment thread scalesim/simulator.py
Comment on lines +85 to +89

pivot = df.pivot_table(
index=["ID", "Kernel Name"],
columns="Metric Name",
values="Metric Value",
Comment thread scalesim/simulator.py
Comment on lines +109 to +118
for _, row in pivot.iterrows():
if row["is_conv"]:
if current_layer is not None:
layer_buckets.append(current_layer)
current_layer = {
'cycles': row['sm__cycles_active.avg'],
'sram_reads': (row.get('l1tex__t_sectors.sum', 0) + row.get('lts__t_sectors.sum', 0)) * 32,
'dram_reads': row.get('dram__bytes_read.sum', 0),
'dram_writes': row.get('dram__bytes_write.sum', 0)
}
Comment thread scalesim/simulator.py
# Override Compute items: Total, Compute, Stall
compute_report_items_this_layer[0] = cycles
compute_report_items_this_layer[1] = cycles
compute_report_items_this_layer[2] = 0
Comment on lines +109 to +118
# Orin Nano has 8 SMs. hmma.sum is aggregate, cycles.avg is per-SM.
# We divide by 8 to get the per-SM tensor contribution for a fair latency comparison.
conv_ten = sum(d["tensor_cycles"] for d in grouped_data) / 8.0

return {
"total_kernels": len(pivot),
"conv_kernels": len(grouped_data),
"total_cycles": pivot["sm__cycles_active.avg"].sum() if "sm__cycles_active.avg" in pivot.columns else 0,
"tensor_cycles": (pivot["sm__pipe_tensor_op_hmma_cycles_active.sum"].sum() / 8.0) if "sm__pipe_tensor_op_hmma_cycles_active.sum" in pivot.columns else 0,
"l1_sectors": pivot["l1tex__t_sectors.sum"].sum() if "l1tex__t_sectors.sum" in pivot.columns else 0,
# sim_dram = sim.get("dram_reads", 0) + sim.get("dram_writes", 0)
# rows.append(("Off-chip (DRAM) Access (Bytes)", hw_dram, sim_dram))

rows.append(("Tensor Compute Cycles",
Conv5_2a,7,7,3,3,512,512,1,
Conv5_2b,7,7,3,3,512,512,1,
FC,1,1,1,1,512,1000,1, No newline at end of file
Layer name, IFMAP Height, IFMAP Width, Filter Height, Filter Width, Channels, Num Filter, Strides,, Target
Comment on lines +161 to +168
elems = elems[0:9]
if ':' in sparsity_field and sparsity_field != '':
sparsity_ratio = sparsity_field.split(':')
else:
sparsity_ratio = ["1", "1"]
elems.append(sparsity_ratio[0])
elems.append(sparsity_ratio[1])
elems.append(target)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants