Skip to content

Notes from 2026-02-25 run at KCL #40

@JostMigenda

Description

@JostMigenda

Just writing up a bunch of impressions while they’re fresh in my mind. I’ll look through them in the next few days and see which ones are actionable and e.g. deserve their own issues. Comments welcome!

  • A few learners were using line_profiler.py or pandas.py as file names for those examples ➔ that leads to errors because the code itself is running e.g. import pandas and Python prioritizes the local file over the installed module
    • There’s a specialised error message for this since Python 3.13—another reason to always upgrade to the latest version, if you can 😉
  • In my slides, I used the FFEA case study as an example when transitioning from function-level to line-level profiling (Roughly: “Sometimes you can tell by looking at the function-level results that there’s an issue—like in this example, where the Rod initialiser function was called millions of times more often than expected, because we were throwing away old rods and creating new ones, instead of updating them. Other times you need to dig deeper and that’s what line-level profiling is for.”) Then, towards the end of the optimisation section in the heat grid example, I could refer back easily (“The first implementation is creating a new grid and throwing the old one away for every step; kind of like the rods we mentioned earlier. If we stick with two grids and update them instead …”), which I think was helpful didactically, because it helps learners see this as a general pattern instead of a one-off example.
  • For parallelisation: ➔ moved these to Add lesson on parallelisation #8
    • We did discuss a bit about how submitting batch jobs to the cluster is already a type of parallelisation, which is external to Python.
    • Someone also asked about parallelisation within the linear algebra libraries underlying NumPy. We have a callout box on that in the NumPy lesson already; might be worth cross-referencing if/when we add a parallelisation episode
    • Someone else asked whether it’s safe to parallelise code that’s writing to the same dataframe. The answer is probably just “in general, no; but check the documentation for details on how to handle that”, but that’s something we should include in a parallelisation episode.
  • In the pandas/vectorisation exercise:
    • We could emphasize more strongly how the solution is basically just taking the existing pythagoras function and replacing df['f_vertical'] with vertical. Maybe switch from np.sqrt() to **0.5 to make this even more explicit?
    • Talking about “the vertical column” and “the horizontal column” is slightly awkward. 😅 Time to bikeshed column names? (My first thought was latitude/longitude, but that implies a spherical geometry, so Pythagoras is no longer valid; maybe length/width would work?)
    • Maybe add a small visualisation here, just to illustrate the shape of that dataframe; and that “columns are numpy arrays; all these functions are iterating over rows”? I noticed a few learners struggle with visualising what’s going on in that code example; and I remember it took me quite a while to thoroughly understand that when I first looked through the materials, too.
  • Doing a live demo of the parallelised downloads example is scary; I’ve had a brief network glitch the first time I tried it, so I got a bunch of error messages. It worked a few seconds later when I tried a second time; but oof … That probably deserves an instructor note.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions