|
| 1 | +--- |
| 2 | +title: "Transforms and Data Manipulation" |
| 3 | +toc: true |
| 4 | +--- |
| 5 | + |
| 6 | +# Transforms and Data Manipulation |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Transforms |
| 11 | + |
| 12 | +Transforms were initially discussed in the [previous lesson](./5_intro_to_observable_plot.md), but in this lesson, we will deep dive with three examples of changing or not changing data cardinality. |
| 13 | + |
| 14 | +At its core, transforms reshape the data from one form into another that can then be plotted. Since transforms are reshapers that are ultimately plotted, the two options include (1) the details about how to do the transform, and (2) the resulting mark options that will be passed along to the plot after the transform is completed. |
| 15 | + |
| 16 | +As said in the [transforms documentation](https://observablehq.com/plot/features/transforms): |
| 17 | +> _Plot’s transforms typically take two options objects as arguments: the first object contains the transform options (above, {y: "count"}), while the second object contains mark options to be “passed through” to the mark ({x: "weight", fill: "sex"}). The transform returns a new options object representing the transformed mark options to be passed to a mark._ |
| 18 | +
|
| 19 | +This takes on a few forms that are worth highlighting. There are many ways to use transforms, but here are a few examples of the cardinality shifts or configuration changes that translate to different visuals. |
| 20 | + |
| 21 | +### Many-To-Fewer (`groupX`, `groupY`, `groupR`) |
| 22 | + |
| 23 | +The simplest example of how transforms can support your data is with the `group` example and a `count` reducer. Last lesson discusses this with penguins, but we will simplify to something even smaller: |
| 24 | + |
| 25 | +```js echo |
| 26 | +const groupData = [ |
| 27 | + { type: "A", value: 2.0 }, |
| 28 | + { type: "B", value: 1.3 }, |
| 29 | + { type: "B", value: 3.2 }, |
| 30 | + { type: "B", value: 3.7 }, |
| 31 | + { type: "C", value: 6.4 }, |
| 32 | + { type: "C", value: 9.1 } |
| 33 | +] |
| 34 | +``` |
| 35 | + |
| 36 | +The `group` transform can help us answer something like "How many type A measurements exist in the dataset?" Translating this to a quantity lends itself to being plotted in a bar -- so we will do that. This time, let's flip the x and y to explore how the transform can group in the Y direction to plot in an x direction. |
| 37 | + |
| 38 | +In the second argument we pass the **channel to group by**: `{ y: "type" }`. The first argument says we want **count** on x. So the transform counts rows per type. |
| 39 | + |
| 40 | +```js echo |
| 41 | +Plot.plot({ |
| 42 | + marks: [ |
| 43 | + Plot.barX( |
| 44 | + groupData, |
| 45 | + Plot.groupY( |
| 46 | + { x: "count" }, |
| 47 | + { y: "type" } |
| 48 | + ) |
| 49 | + ) |
| 50 | + ] |
| 51 | +}) |
| 52 | +``` |
| 53 | + |
| 54 | +What does the transform actually return? Let's take a look at it closely: |
| 55 | +`groupOutput`: |
| 56 | +```js |
| 57 | +const groupOutput = Plot.groupY( |
| 58 | + { x: "count" }, |
| 59 | + { y: "type" } |
| 60 | +) |
| 61 | +display(groupOutput) |
| 62 | +``` |
| 63 | + |
| 64 | + |
| 65 | +It’s an options object (with `x`, `y`, and an internal `transform` function) that the mark uses to pre-calculate values. You can pass through other channels in the second argument just like on a normal mark—e.g. `fill` so each bar gets a color: |
| 66 | + |
| 67 | +```js echo |
| 68 | +Plot.plot({ |
| 69 | + marks: [ |
| 70 | + Plot.barX( |
| 71 | + groupData, |
| 72 | + Plot.groupY( |
| 73 | + { x: "count" }, |
| 74 | + { y: "type", fill: "type" } |
| 75 | + ) |
| 76 | + ) |
| 77 | + ], |
| 78 | + color: { legend: true } |
| 79 | +}) |
| 80 | +``` |
| 81 | + |
| 82 | +There are other reducers within the same `group` transform, including helpful ones like `min`, `max`, `mean`, `sum`, and [more](https://observablehq.com/plot/transforms/group#group-options). Let's try the same plot with other reducer options -- you can select which reducer you'd like to try out: |
| 83 | + |
| 84 | +```js |
| 85 | +const reducer = view(Inputs.select(["min", "max", "mean", "sum"], { value: "min", label: "Reducer" })) |
| 86 | +``` |
| 87 | + |
| 88 | +```js echo |
| 89 | +Plot.plot({ |
| 90 | + marks: [ |
| 91 | + Plot.barX( |
| 92 | + groupData, |
| 93 | + Plot.groupY( |
| 94 | + { x: reducer }, // ← selected reducer above |
| 95 | + { y: "type", x: "value" } |
| 96 | + ) |
| 97 | + ) |
| 98 | + ] |
| 99 | +}) |
| 100 | +``` |
| 101 | + |
| 102 | +This time, we need to include both the `x` and `y` channel as options passed to the mark, with the reducer being applied to the `x` channel. |
| 103 | + |
| 104 | +--- |
| 105 | + |
| 106 | +### Continuous Many-To-Fewer (`binX`, `binY`, `hexbin`, etc.) |
| 107 | + |
| 108 | +When the data is continuous, but we want to demonstrate with a bar or histogram, we have to do some binning, effectively answering "Where are the concentrations of continuous values?" |
| 109 | + |
| 110 | +Use data with a **continuous** field (e.g. numbers) so the transform can split the range into intervals: |
| 111 | + |
| 112 | +```js echo |
| 113 | +const binData = [ |
| 114 | + { value: 2.1 }, { value: 2.4 }, |
| 115 | + { value: 3.0 }, { value: 3.2 }, { value: 3.5 }, |
| 116 | + { value: 4.0 }, { value: 4.2 }, { value: 4.5 }, { value: 4.7 }, { value: 4.9 }, |
| 117 | + { value: 5.0 }, { value: 5.1 }, { value: 5.2 }, { value: 5.3 }, { value: 5.4 }, { value: 5.6 }, { value: 5.8 }, { value: 5.9 }, |
| 118 | + { value: 6.0 }, { value: 6.2 }, { value: 6.4 }, { value: 6.5 }, { value: 6.7 }, |
| 119 | + { value: 7.0 }, { value: 7.2 }, { value: 7.5 }, { value: 7.8 }, |
| 120 | + { value: 8.0 }, { value: 8.3 } |
| 121 | +] |
| 122 | +``` |
| 123 | + |
| 124 | +```js echo |
| 125 | +Plot.plot({ |
| 126 | + height: 150, |
| 127 | + marks: [ |
| 128 | + Plot.rectX( |
| 129 | + binData, |
| 130 | + Plot.binY( |
| 131 | + { x: "count" }, |
| 132 | + { y: "value" } |
| 133 | + ) |
| 134 | + ) |
| 135 | + ] |
| 136 | +}) |
| 137 | +``` |
| 138 | + |
| 139 | +This bins continuous `value` along x and counts how many points fall in each bin (a histogram). |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +### One-To-One (`windowX`, `windowY`) |
| 144 | + |
| 145 | +Window transforms keep **the same number of rows**: each row gets a new value computed from a sliding window of its neighbors (e.g. rolling mean). So one row in, one row out. |
| 146 | + |
| 147 | +The window transform is a specialized map transform that computes a moving window and then derives summary statistics from the current window, say to compute rolling averages, rolling minimums, or rolling maximums. This is a bit more rare, but can be helpful if you want to reshape your data by computing a rolling average to smooth out the visual. |
| 148 | + |
| 149 | +Use `windowX` when the window slides along the x-axis (e.g. time); use `windowY` when it slides along y. Options include `k` (window size) and `reduce` (`"mean"`, `"min"`, `"max"`, etc.). |
| 150 | + |
| 151 | +```js echo |
| 152 | +const windowData = [ |
| 153 | + { x: 1, y: 3 }, { x: 2, y: 5 }, { x: 3, y: 4 }, { x: 4, y: 7 }, { x: 5, y: 6 }, |
| 154 | + { x: 6, y: 8 }, { x: 7, y: 5 }, { x: 8, y: 6 }, { x: 9, y: 9 }, { x: 10, y: 7 } |
| 155 | +] |
| 156 | +``` |
| 157 | + |
| 158 | +```js echo |
| 159 | +Plot.plot({ |
| 160 | + height: 180, |
| 161 | + marks: [ |
| 162 | + Plot.lineY(windowData, { x: "x", y: "y", stroke: "#999", strokeDasharray: "2 2" }), |
| 163 | + Plot.lineY(windowData, Plot.windowY({ k: 3 }, { x: "x", y: "y", stroke: "#1a5fb4" })) |
| 164 | + ] |
| 165 | +}) |
| 166 | +``` |
| 167 | + |
| 168 | +Here the gray dashed line is the raw data; the blue solid line is the same points with a 3-point rolling mean—one value per row, so cardinality stays one-to-one. |
| 169 | + |
| 170 | +--- |
| 171 | + |
| 172 | +## Data Manipulation |
| 173 | + |
| 174 | +Transforms are incredibly valuable, but they aren't the only way to reshape the data into the structure you need. |
| 175 | + |
| 176 | + |
| 177 | +## Javascript Data Manipulation Methods |
| 178 | + |
| 179 | +Just a quick note / exposure to some helpful JS data manipulation. JavaScript arrays give you direct ways to reshape data before it reaches Plot. A few you'll use often: |
| 180 | + |
| 181 | +- **`filter`** — Keep only rows that pass a test (e.g. `data.filter(d => d.year >= 2020)`). Same number of columns, fewer rows. |
| 182 | +- **`map`** — Turn each row into a new value or object (e.g. `data.map(d => ({ ...d, total: d.a + d.b }))`). Same number of rows, new or changed columns. |
| 183 | +- **`reduce`** — Collapse the array into a single value or structure (e.g. sum, or an object of counts per category). You pass a callback `(accumulator, currentItem) => ...` and an initial value; the callback runs once per item, updating the accumulator each time, and you return the final accumulator. Often used to build aggregated datasets. See [MDN: Array.prototype.reduce](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce) for the full API. |
| 184 | + |
| 185 | +You can chain these: filter, then map, then pass the result to Plot. Or use `reduce` to build a summarized array and plot that. |
| 186 | + |
| 187 | +## Group with Javascript |
| 188 | + |
| 189 | +You don't have to use Plot's group transform to get the same chart. You can compute the grouped dataset in JavaScript (e.g. with a `for` loop or `reduce`), then pass that **pre-aggregated** data into a mark and bind channels directly—no transform. |
| 190 | + |
| 191 | +Here we reproduce the "count by type" bar chart using the same `groupData`. First, with a **for loop**: |
| 192 | + |
| 193 | +```js echo |
| 194 | +const countsByType = {} |
| 195 | +for (const d of groupData) { |
| 196 | + countsByType[d.type] = (countsByType[d.type] ?? 0) + 1 |
| 197 | +} |
| 198 | +const groupedArrayLoop = Object.entries(countsByType).map(([type, count]) => ({ type, count })) |
| 199 | +``` |
| 200 | +
|
| 201 | +```js echo |
| 202 | +Plot.plot({ |
| 203 | + marks: [ |
| 204 | + Plot.barX(groupedArrayLoop, { x: "count", y: "type" }) |
| 205 | + ] |
| 206 | +}) |
| 207 | +``` |
| 208 | +
|
| 209 | +Same result using **reduce**: |
| 210 | +
|
| 211 | +```js echo |
| 212 | +const groupedByType = groupData.reduce((acc, d) => { |
| 213 | + acc[d.type] = (acc[d.type] ?? 0) + 1 |
| 214 | + return acc |
| 215 | +}, {}) |
| 216 | +const groupedArray = Object.entries(groupedByType).map(([type, count]) => ({ type, count })) |
| 217 | +``` |
| 218 | +
|
| 219 | +```js echo |
| 220 | +Plot.plot({ |
| 221 | + marks: [ |
| 222 | + Plot.barX(groupedArray, { x: "count", y: "type" }) |
| 223 | + ] |
| 224 | +}) |
| 225 | +``` |
| 226 | +
|
| 227 | +Same result as `Plot.barX(groupData, Plot.groupY({ x: "count" }, { y: "type" }))`—just two ways to get there. Transforms are convenient when you want to keep raw data in one place and let Plot do the reshaping; JS manipulation is useful when you need the aggregated data elsewhere (e.g. for tables or multiple charts) or prefer explicit control. |
0 commit comments