For begging more or less we need only a single optimization - Kernel fusion. The optimization basically states that whenever we have a composition of nodes, but the first one will never be used again we can "fuse" the computation avoiding extra looping. This would require a special FusionOp which would contain a graph initself.
Example: f = tanh(a + b). This normally would become: n0 = a + b and n0 = tanh(n0) (assuming the memory optimizer works well). However, on a GPU these are still 2 kernels. on a CPU two loops. Fusing this would mean we move from:
for (ni, ai, bi) in Zip::new((&mut n0, &a, &b)) {
*ni = ai + bi;
}
for &mut ni in &mut n0 {
*ni = tanh(*ni);
}
To
for (ni, ai, bi) in Zip::new((&mut n0, &a, &b)) {
*ni = tanh(ai + bi);
}
For begging more or less we need only a single optimization - Kernel fusion. The optimization basically states that whenever we have a composition of nodes, but the first one will never be used again we can "fuse" the computation avoiding extra looping. This would require a special
FusionOpwhich would contain a graph initself.Example:
f = tanh(a + b). This normally would become:n0 = a + bandn0 = tanh(n0)(assuming the memory optimizer works well). However, on a GPU these are still 2 kernels. on a CPU two loops. Fusing this would mean we move from:To