dispel4py Monitoring Guide

This document explains the three monitoring-enabled mappings:

timed_multi
timed_simple
timed_mpi

These mappings run your workflow and automatically produce timing and graph artifacts under a monitoring output directory (default: timings/).

1. What These Mappings Are For

`timed_simple`

Use this for sequential/local debugging and baseline measurements.

Runs workflow in a single Python process (simple mapping behavior).
Best when you want easy reproducibility and low setup complexity.
Good for checking logic and quick latency checks before parallel runs.

`timed_multi`

Use this for local multiprocessing performance analysis.

Runs workflow with Python multiprocessing.
Uses mapping/process allocation logic from multi.
Best for measuring how PE instance distribution and local parallelism affect timing.

`timed_mpi`

Use this for MPI/distributed performance analysis.

Runs workflow with MPI ranks.
Best for cluster/HPC-style runs or true distributed execution.
Launch with mpiexec/mpirun.

2. Common Output Goal

All three timed mappings produce:

Per-instance totals and averages.
Per-iteration timings for each PE instance.
Aggregate summaries with latency statistics (min, p50, p95, max).
Abstract graph shape (workflow as defined by user).
Concrete graph shape (runtime PE instances and edges).
Optional PNG figures for abstract and concrete graphs.

3. Command Examples

`timed_multi` (local parallel)

Basic example:

dispel4py timed_multi dispel4py.examples.graph_testing.word_count -i 10 -n 10

Note that if we do not specify --timing-dir directory, it will create automatically and store traces in ./timings directory.

Recommended explicit version

dispel4py timed_multi dispel4py.examples.graph_testing.word_count -i 10 -n 10 --print-shape

Custom output location:

dispel4py timed_multi dispel4py.examples.graph_testing.word_count -i 10 -n 10 \
  --timing-dir timings_wc --timing-prefix wc_monitor

`timed_simple` (sequential)

dispel4py timed_simple dispel4py.examples.graph_testing.word_count -i 10 --print-shape

`timed_mpi` (MPI/distributed)

mpiexec -n 10 dispel4py timed_mpi dispel4py.examples.graph_testing.word_count -i 10 --print-shape

Optional explicit MPI process count:

mpiexec -n 10 dispel4py timed_mpi dispel4py.examples.graph_testing.word_count -i 10 --num_processes 10

4. Mapping-Specific Flags

`timed_multi`

-n, --num: number of local worker processes.
-s, --simple: force partitioned/simple-style fallback behavior from multi mapping.

`timed_mpi`

-n, --num_processes: number of MPI processes (optional if inferred from MPI world size).
-s, --simple: force partitioned/simple-style fallback behavior.

`timed_simple`

No mapping-specific count flag.
Uses sequential/simple processing.

5. Shared Monitoring Flags (All Three)

--timing-dir: output directory (default: timings)
--timing-prefix: filename prefix (default: monitor)
--run-id: custom run identifier; if omitted, generated automatically
--summary-file: output path for per-PE summary CSV
--instance-summary-file: output path for per-instance summary CSV
--iteration-summary-file: output path for merged per-iteration CSV
--iteration-latency-summary-file: output path for per-instance latency stats derived from iteration timings
--shape-file: output path for abstract graph JSON
--concrete-shape-file: output path for concrete graph JSON
--abstract-figure-file: output path for abstract graph PNG
--concrete-figure-file: output path for concrete graph PNG
--no-graph-figures: skip PNG generation
--print-shape: print abstract and concrete shape details to stdout

6. Understanding Each Generated File

Given a run like:

ls -lht timings

You may see files such as:

monitor_shape_run<id>.json
monitor_concrete_shape_run<id>.json
monitor_summary_run<id>.csv
monitor_instances_run<id>.csv
monitor_iteration_timings_run<id>.csv
monitor_iteration_timings_summary_run<id>.csv
monitor_<PE>_rank<R>_run<id>.csv
monitor_iterations_<PE>_rank<R>_run<id>.csv
monitor_abstract_graph_run<id>.png
monitor_concrete_graph_run<id>.png

`monitor_shape_run<id>.json`

Abstract workflow graph.

Represents the user-defined workflow topology.
Nodes are logical PEs.
Edges are logical workflow connections.
Includes topological order when possible.

`monitor_concrete_shape_run<id>.json`

Concrete runtime instance graph.

Represents instantiated PE ranks used in execution.
Nodes are PE instances such as WordCounter1@2.
Edges represent concrete communication paths between ranks.
Includes process allocation table and topological order.

`monitor_<PE>_rank<R>_run<id>.csv`

Per-instance total timing summary.

One file per PE instance/rank.
Contains count, total_secs, avg_secs.
Useful for fast instance-level totals.

`monitor_iterations_<PE>_rank<R>_run<id>.csv`

Per-instance per-iteration trace.

One row per iteration processed by that PE instance.
Core file when you need iteration-level latencies.

`monitor_summary_run<id>.csv`

Per-PE aggregate summary across all ranks.

Combines all instances of each PE.
Includes:
- total_count, total_secs, avg_secs
- min_secs, p50_secs, p95_secs, max_secs

`monitor_instances_run<id>.csv`

Per-PE-instance aggregate summary.

One row per instance (PE@rank).
Includes:
- total_count, total_secs, avg_secs
- min_secs, p50_secs, p95_secs, max_secs

`monitor_iteration_timings_run<id>.csv`

Merged iteration-level table across all instances.

Each row is an iteration timing event.
Includes iteration latency plus instance-level context columns:
- instance_p50_secs, instance_p95_secs, instance_max_secs

`monitor_iteration_timings_summary_run<id>.csv`

Latency summary derived from per-iteration data.

One row per instance.
Computed directly from iteration traces.
Includes:
- iteration_count, total_secs, avg_secs
- min_secs, p50_secs, p95_secs, max_secs

`monitor_abstract_graph_run<id>.png`

Figure of abstract graph.

Visual of user-defined workflow topology.

`monitor_concrete_graph_run<id>.png`

Figure of concrete graph.

Visual of instantiated runtime graph (ranks/instances).

7. Abstract vs Concrete (Key Difference)

Abstract graph:

What you define in workflow code.
Independent of runtime process assignment.

Concrete graph:

What is actually executed after mapping allocates ranks/instances.
Depends on mapping (simple, multi, mpi) and process assignment rules.

8. Practical Interpretation Tips

Use monitor_iterations_*.csv for detailed latency analysis and jitter/outlier detection.
Use monitor_instances_*.csv to compare load balance between ranks.
Use monitor_summary_*.csv to compare PE-level hotspots.
Use concrete shape JSON/PNG to explain why some ranks do more work.

9. Latency Metrics Explained

These columns are computed from per-iteration timings (in seconds):

p50_secs: 50th percentile latency (median). About half of iterations are faster, half are slower.
p95_secs: 95th percentile latency. 95% of iterations are at or below this value; highlights tail/slower behavior.
max_secs: maximum observed latency (slowest iteration).

Related columns:

min_secs: minimum observed latency (fastest iteration).
avg_secs: arithmetic mean latency (can be influenced by outliers).

Quick intuition:

If p95_secs is much larger than p50_secs, latency is bursty/has outliers.
If max_secs is far above p95_secs, there may be rare extreme slow iterations.

10. Notes

If matplotlib is unavailable or incompatible, PNGs may be skipped.
JSON and CSV outputs are still generated even when PNGs are skipped.
run_id is timestamp-based and includes microseconds to avoid collisions between rapid consecutive runs.
If you are benchmarking performance, prefer timed_* without provenance enabled.
Provenance instrumentation adds extra work and may also change multiprocessing behavior/platform scheduling, so wall-clock comparisons can become misleading.
If you need both traceability and timings, run timed_* + provenance, but treat those timings as provenance-aware operational traces (not clean baseline performance numbers).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dispel4py Monitoring Guide

1. What These Mappings Are For

`timed_simple`

`timed_multi`

`timed_mpi`

2. Common Output Goal

3. Command Examples

`timed_multi` (local parallel)

`timed_simple` (sequential)

`timed_mpi` (MPI/distributed)

4. Mapping-Specific Flags

`timed_multi`

`timed_mpi`

`timed_simple`

5. Shared Monitoring Flags (All Three)

6. Understanding Each Generated File

`monitor_shape_run<id>.json`

`monitor_concrete_shape_run<id>.json`

`monitor_<PE>_rank<R>_run<id>.csv`

`monitor_iterations_<PE>_rank<R>_run<id>.csv`

`monitor_summary_run<id>.csv`

`monitor_instances_run<id>.csv`

`monitor_iteration_timings_run<id>.csv`

`monitor_iteration_timings_summary_run<id>.csv`

`monitor_abstract_graph_run<id>.png`

`monitor_concrete_graph_run<id>.png`

7. Abstract vs Concrete (Key Difference)

8. Practical Interpretation Tips

9. Latency Metrics Explained

10. Notes

FilesExpand file tree

README_Monitor.md

Latest commit

History

README_Monitor.md

File metadata and controls

dispel4py Monitoring Guide

1. What These Mappings Are For

timed_simple

timed_multi

timed_mpi

2. Common Output Goal

3. Command Examples

timed_multi (local parallel)

timed_simple (sequential)

timed_mpi (MPI/distributed)

4. Mapping-Specific Flags

timed_multi

timed_mpi

timed_simple

5. Shared Monitoring Flags (All Three)

6. Understanding Each Generated File

monitor_shape_run<id>.json

monitor_concrete_shape_run<id>.json

monitor_<PE>_rank<R>_run<id>.csv

monitor_iterations_<PE>_rank<R>_run<id>.csv

monitor_summary_run<id>.csv

monitor_instances_run<id>.csv

monitor_iteration_timings_run<id>.csv

monitor_iteration_timings_summary_run<id>.csv

monitor_abstract_graph_run<id>.png

monitor_concrete_graph_run<id>.png

7. Abstract vs Concrete (Key Difference)

8. Practical Interpretation Tips

9. Latency Metrics Explained

10. Notes

`timed_simple`

`timed_multi`

`timed_mpi`

`timed_multi` (local parallel)

`timed_simple` (sequential)

`timed_mpi` (MPI/distributed)

`timed_multi`

`timed_mpi`

`timed_simple`

`monitor_shape_run<id>.json`

`monitor_concrete_shape_run<id>.json`

`monitor_<PE>_rank<R>_run<id>.csv`

`monitor_iterations_<PE>_rank<R>_run<id>.csv`

`monitor_summary_run<id>.csv`

`monitor_instances_run<id>.csv`

`monitor_iteration_timings_run<id>.csv`

`monitor_iteration_timings_summary_run<id>.csv`

`monitor_abstract_graph_run<id>.png`

`monitor_concrete_graph_run<id>.png`