Skip to content

Commit e8d3d53

Browse files
committed
add lesson
1 parent 97e90a8 commit e8d3d53

1 file changed

Lines changed: 227 additions & 0 deletions

File tree

Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
---
2+
title: "Transforms and Data Manipulation"
3+
toc: true
4+
---
5+
6+
# Transforms and Data Manipulation
7+
8+
---
9+
10+
## Transforms
11+
12+
Transforms were initially discussed in the [previous lesson](./5_intro_to_observable_plot.md), but in this lesson, we will deep dive with three examples of changing or not changing data cardinality.
13+
14+
At its core, transforms reshape the data from one form into another that can then be plotted. Since transforms are reshapers that are ultimately plotted, the two options include (1) the details about how to do the transform, and (2) the resulting mark options that will be passed along to the plot after the transform is completed.
15+
16+
As said in the [transforms documentation](https://observablehq.com/plot/features/transforms):
17+
> _Plot’s transforms typically take two options objects as arguments: the first object contains the transform options (above, {y: "count"}), while the second object contains mark options to be “passed through” to the mark ({x: "weight", fill: "sex"}). The transform returns a new options object representing the transformed mark options to be passed to a mark._
18+
19+
This takes on a few forms that are worth highlighting. There are many ways to use transforms, but here are a few examples of the cardinality shifts or configuration changes that translate to different visuals.
20+
21+
### Many-To-Fewer (`groupX`, `groupY`, `groupR`)
22+
23+
The simplest example of how transforms can support your data is with the `group` example and a `count` reducer. Last lesson discusses this with penguins, but we will simplify to something even smaller:
24+
25+
```js echo
26+
const groupData = [
27+
{ type: "A", value: 2.0 },
28+
{ type: "B", value: 1.3 },
29+
{ type: "B", value: 3.2 },
30+
{ type: "B", value: 3.7 },
31+
{ type: "C", value: 6.4 },
32+
{ type: "C", value: 9.1 }
33+
]
34+
```
35+
36+
The `group` transform can help us answer something like "How many type A measurements exist in the dataset?" Translating this to a quantity lends itself to being plotted in a bar -- so we will do that. This time, let's flip the x and y to explore how the transform can group in the Y direction to plot in an x direction.
37+
38+
In the second argument we pass the **channel to group by**: `{ y: "type" }`. The first argument says we want **count** on x. So the transform counts rows per type.
39+
40+
```js echo
41+
Plot.plot({
42+
marks: [
43+
Plot.barX(
44+
groupData,
45+
Plot.groupY(
46+
{ x: "count" },
47+
{ y: "type" }
48+
)
49+
)
50+
]
51+
})
52+
```
53+
54+
What does the transform actually return? Let's take a look at it closely:
55+
`groupOutput`:
56+
```js
57+
const groupOutput = Plot.groupY(
58+
{ x: "count" },
59+
{ y: "type" }
60+
)
61+
display(groupOutput)
62+
```
63+
64+
65+
It’s an options object (with `x`, `y`, and an internal `transform` function) that the mark uses to pre-calculate values. You can pass through other channels in the second argument just like on a normal mark—e.g. `fill` so each bar gets a color:
66+
67+
```js echo
68+
Plot.plot({
69+
marks: [
70+
Plot.barX(
71+
groupData,
72+
Plot.groupY(
73+
{ x: "count" },
74+
{ y: "type", fill: "type" }
75+
)
76+
)
77+
],
78+
color: { legend: true }
79+
})
80+
```
81+
82+
There are other reducers within the same `group` transform, including helpful ones like `min`, `max`, `mean`, `sum`, and [more](https://observablehq.com/plot/transforms/group#group-options). Let's try the same plot with other reducer options -- you can select which reducer you'd like to try out:
83+
84+
```js
85+
const reducer = view(Inputs.select(["min", "max", "mean", "sum"], { value: "min", label: "Reducer" }))
86+
```
87+
88+
```js echo
89+
Plot.plot({
90+
marks: [
91+
Plot.barX(
92+
groupData,
93+
Plot.groupY(
94+
{ x: reducer }, // ← selected reducer above
95+
{ y: "type", x: "value" }
96+
)
97+
)
98+
]
99+
})
100+
```
101+
102+
This time, we need to include both the `x` and `y` channel as options passed to the mark, with the reducer being applied to the `x` channel.
103+
104+
---
105+
106+
### Continuous Many-To-Fewer (`binX`, `binY`, `hexbin`, etc.)
107+
108+
When the data is continuous, but we want to demonstrate with a bar or histogram, we have to do some binning, effectively answering "Where are the concentrations of continuous values?"
109+
110+
Use data with a **continuous** field (e.g. numbers) so the transform can split the range into intervals:
111+
112+
```js echo
113+
const binData = [
114+
{ value: 2.1 }, { value: 2.4 },
115+
{ value: 3.0 }, { value: 3.2 }, { value: 3.5 },
116+
{ value: 4.0 }, { value: 4.2 }, { value: 4.5 }, { value: 4.7 }, { value: 4.9 },
117+
{ value: 5.0 }, { value: 5.1 }, { value: 5.2 }, { value: 5.3 }, { value: 5.4 }, { value: 5.6 }, { value: 5.8 }, { value: 5.9 },
118+
{ value: 6.0 }, { value: 6.2 }, { value: 6.4 }, { value: 6.5 }, { value: 6.7 },
119+
{ value: 7.0 }, { value: 7.2 }, { value: 7.5 }, { value: 7.8 },
120+
{ value: 8.0 }, { value: 8.3 }
121+
]
122+
```
123+
124+
```js echo
125+
Plot.plot({
126+
height: 150,
127+
marks: [
128+
Plot.rectX(
129+
binData,
130+
Plot.binY(
131+
{ x: "count" },
132+
{ y: "value" }
133+
)
134+
)
135+
]
136+
})
137+
```
138+
139+
This bins continuous `value` along x and counts how many points fall in each bin (a histogram).
140+
141+
---
142+
143+
### One-To-One (`windowX`, `windowY`)
144+
145+
Window transforms keep **the same number of rows**: each row gets a new value computed from a sliding window of its neighbors (e.g. rolling mean). So one row in, one row out.
146+
147+
The window transform is a specialized map transform that computes a moving window and then derives summary statistics from the current window, say to compute rolling averages, rolling minimums, or rolling maximums. This is a bit more rare, but can be helpful if you want to reshape your data by computing a rolling average to smooth out the visual.
148+
149+
Use `windowX` when the window slides along the x-axis (e.g. time); use `windowY` when it slides along y. Options include `k` (window size) and `reduce` (`"mean"`, `"min"`, `"max"`, etc.).
150+
151+
```js echo
152+
const windowData = [
153+
{ x: 1, y: 3 }, { x: 2, y: 5 }, { x: 3, y: 4 }, { x: 4, y: 7 }, { x: 5, y: 6 },
154+
{ x: 6, y: 8 }, { x: 7, y: 5 }, { x: 8, y: 6 }, { x: 9, y: 9 }, { x: 10, y: 7 }
155+
]
156+
```
157+
158+
```js echo
159+
Plot.plot({
160+
height: 180,
161+
marks: [
162+
Plot.lineY(windowData, { x: "x", y: "y", stroke: "#999", strokeDasharray: "2 2" }),
163+
Plot.lineY(windowData, Plot.windowY({ k: 3 }, { x: "x", y: "y", stroke: "#1a5fb4" }))
164+
]
165+
})
166+
```
167+
168+
Here the gray dashed line is the raw data; the blue solid line is the same points with a 3-point rolling mean—one value per row, so cardinality stays one-to-one.
169+
170+
---
171+
172+
## Data Manipulation
173+
174+
Transforms are incredibly valuable, but they aren't the only way to reshape the data into the structure you need.
175+
176+
177+
## Javascript Data Manipulation Methods
178+
179+
Just a quick note / exposure to some helpful JS data manipulation. JavaScript arrays give you direct ways to reshape data before it reaches Plot. A few you'll use often:
180+
181+
- **`filter`** — Keep only rows that pass a test (e.g. `data.filter(d => d.year >= 2020)`). Same number of columns, fewer rows.
182+
- **`map`** — Turn each row into a new value or object (e.g. `data.map(d => ({ ...d, total: d.a + d.b }))`). Same number of rows, new or changed columns.
183+
- **`reduce`** — Collapse the array into a single value or structure (e.g. sum, or an object of counts per category). You pass a callback `(accumulator, currentItem) => ...` and an initial value; the callback runs once per item, updating the accumulator each time, and you return the final accumulator. Often used to build aggregated datasets. See [MDN: Array.prototype.reduce](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/reduce) for the full API.
184+
185+
You can chain these: filter, then map, then pass the result to Plot. Or use `reduce` to build a summarized array and plot that.
186+
187+
## Group with Javascript
188+
189+
You don't have to use Plot's group transform to get the same chart. You can compute the grouped dataset in JavaScript (e.g. with a `for` loop or `reduce`), then pass that **pre-aggregated** data into a mark and bind channels directly—no transform.
190+
191+
Here we reproduce the "count by type" bar chart using the same `groupData`. First, with a **for loop**:
192+
193+
```js echo
194+
const countsByType = {}
195+
for (const d of groupData) {
196+
countsByType[d.type] = (countsByType[d.type] ?? 0) + 1
197+
}
198+
const groupedArrayLoop = Object.entries(countsByType).map(([type, count]) => ({ type, count }))
199+
```
200+
201+
```js echo
202+
Plot.plot({
203+
marks: [
204+
Plot.barX(groupedArrayLoop, { x: "count", y: "type" })
205+
]
206+
})
207+
```
208+
209+
Same result using **reduce**:
210+
211+
```js echo
212+
const groupedByType = groupData.reduce((acc, d) => {
213+
acc[d.type] = (acc[d.type] ?? 0) + 1
214+
return acc
215+
}, {})
216+
const groupedArray = Object.entries(groupedByType).map(([type, count]) => ({ type, count }))
217+
```
218+
219+
```js echo
220+
Plot.plot({
221+
marks: [
222+
Plot.barX(groupedArray, { x: "count", y: "type" })
223+
]
224+
})
225+
```
226+
227+
Same result as `Plot.barX(groupData, Plot.groupY({ x: "count" }, { y: "type" }))`—just two ways to get there. Transforms are convenient when you want to keep raw data in one place and let Plot do the reshaping; JS manipulation is useful when you need the aggregated data elsewhere (e.g. for tables or multiple charts) or prefer explicit control.

0 commit comments

Comments
 (0)