You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just writing up a bunch of impressions while they’re fresh in my mind. I’ll look through them in the next few days and see which ones are actionable and e.g. deserve their own issues. Comments welcome!
A few learners were using line_profiler.py or pandas.py as file names for those examples ➔ that leads to errors because the code itself is running e.g. import pandas and Python prioritizes the local file over the installed module
There’s a specialised error message for this since Python 3.13—another reason to always upgrade to the latest version, if you can 😉
In my slides, I used the FFEA case study as an example when transitioning from function-level to line-level profiling (Roughly: “Sometimes you can tell by looking at the function-level results that there’s an issue—like in this example, where the Rod initialiser function was called millions of times more often than expected, because we were throwing away old rods and creating new ones, instead of updating them. Other times you need to dig deeper and that’s what line-level profiling is for.”) Then, towards the end of the optimisation section in the heat grid example, I could refer back easily (“The first implementation is creating a new grid and throwing the old one away for every step; kind of like the rods we mentioned earlier. If we stick with two grids and update them instead …”), which I think was helpful didactically, because it helps learners see this as a general pattern instead of a one-off example.
We did discuss a bit about how submitting batch jobs to the cluster is already a type of parallelisation, which is external to Python.
Someone also asked about parallelisation within the linear algebra libraries underlying NumPy. We have a callout box on that in the NumPy lesson already; might be worth cross-referencing if/when we add a parallelisation episode
Someone else asked whether it’s safe to parallelise code that’s writing to the same dataframe. The answer is probably just “in general, no; but check the documentation for details on how to handle that”, but that’s something we should include in a parallelisation episode.
In the pandas/vectorisation exercise:
We could emphasize more strongly how the solution is basically just taking the existing pythagoras function and replacing df['f_vertical'] with vertical. Maybe switch from np.sqrt() to **0.5 to make this even more explicit?
Talking about “the vertical column” and “the horizontal column” is slightly awkward. 😅 Time to bikeshed column names? (My first thought was latitude/longitude, but that implies a spherical geometry, so Pythagoras is no longer valid; maybe length/width would work?)
Maybe add a small visualisation here, just to illustrate the shape of that dataframe; and that “columns are numpy arrays; all these functions are iterating over rows”? I noticed a few learners struggle with visualising what’s going on in that code example; and I remember it took me quite a while to thoroughly understand that when I first looked through the materials, too.
Doing a live demo of the parallelised downloads example is scary; I’ve had a brief network glitch the first time I tried it, so I got a bunch of error messages. It worked a few seconds later when I tried a second time; but oof … That probably deserves an instructor note.
Just writing up a bunch of impressions while they’re fresh in my mind. I’ll look through them in the next few days and see which ones are actionable and e.g. deserve their own issues. Comments welcome!
line_profiler.pyorpandas.pyas file names for those examples ➔ that leads to errors because the code itself is running e.g.import pandasand Python prioritizes the local file over the installed modulepythagorasfunction and replacingdf['f_vertical']withvertical. Maybe switch fromnp.sqrt()to**0.5to make this even more explicit?