Notes from 2026-02-25 run at KCL

Just writing up a bunch of impressions while they’re fresh in my mind. I’ll look through them in the next few days and see which ones are actionable and e.g. deserve their own issues. Comments welcome!

- [ ] A few learners were using `line_profiler.py` or `pandas.py` as file names for those examples ➔ that leads to errors because the code itself is running e.g. `import pandas` and Python prioritizes the local file over the installed module
    * There’s a [specialised error message](https://docs.python.org/3.15/whatsnew/3.13.html#improved-error-messages) for this since Python 3.13—another reason to always upgrade to the latest version, if you can 😉
- [ ] In my slides, I used [the FFEA case study](https://sig-rpc.github.io/blog/casestudy-ffea/#profiling-rods) as an example when transitioning from function-level to line-level profiling (Roughly: “Sometimes you can tell by looking at the function-level results that there’s an issue—like in this example, where the Rod initialiser function was called millions of times more often than expected, because we were throwing away old rods and creating new ones, instead of updating them. Other times you need to dig deeper and that’s what line-level profiling is for.”) Then, towards the end of the optimisation section in the heat grid example, I could refer back easily (“The first implementation is creating a new grid and throwing the old one away for every step; kind of like the rods we mentioned earlier. If we stick with two grids and update them instead …”), which I think was helpful didactically, because it helps learners see this as a general pattern instead of a one-off example.
* For parallelisation: ➔ moved these to #8 
    - [X] We did discuss a bit about how submitting batch jobs to the cluster is already a type of parallelisation, which is external to Python.
    - [X] Someone also asked about parallelisation within the linear algebra libraries underlying NumPy. We have a callout box on that in the NumPy lesson already; might be worth cross-referencing if/when we add a parallelisation episode
    - [X] Someone else asked whether it’s safe to parallelise code that’s writing to the same dataframe. The answer is probably just “in general, no; but check the documentation for details on how to handle that”, but that’s something we should include in a parallelisation episode.
* In the pandas/vectorisation exercise:
   - [X] We could emphasize more strongly how the solution is basically just taking the existing `pythagoras` function and replacing `df['f_vertical']` with `vertical`. Maybe switch from `np.sqrt()` to `**0.5` to make this even more explicit?
   - [X] Talking about “the vertical column” and “the horizontal column” is slightly awkward. 😅 Time to bikeshed column names? (My first thought was latitude/longitude, but that implies a spherical geometry, so Pythagoras is no longer valid; maybe length/width would work?)
   - [ ] Maybe add a small visualisation here, just to illustrate the shape of that dataframe; and that “columns are numpy arrays; all these functions are iterating over rows”? I noticed a few learners struggle with visualising what’s going on in that code example; and I remember it took me quite a while to thoroughly understand that when I first looked through the materials, too.
- [X] Doing a live demo of the parallelised downloads example is _scary_; I’ve had a brief network glitch the first time I tried it, so I got a bunch of error messages. It worked a few seconds later when I tried a second time; but oof … That probably deserves an instructor note.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Notes from 2026-02-25 run at KCL #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Notes from 2026-02-25 run at KCL #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions