We need benchmarks to measure proposed improvements against. We probably need several different test cases, each focusing on a different type of workload.
Most workloads will probably be a mix of these three types of tasks, but it makes sense to benchmark them separately in a standard way. When a given benchmark is applicable to rayon as well (eg, no async jobs) then we should make it possible to run with either and compare the two.