Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion episodes/optimisation-data-structures-algorithms.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ The only limitation is that where two objects are equal they must have the same

## Sets

Sets are dictionaries without the values (both are declared using `{}`), a collection of unique keys equivalent to the mathematical set. *Modern CPython now uses a set implementation distinct from that of it's dictionary, however they still behave much the same in terms of performance characteristics.*
Sets are dictionaries without the values (both are declared using `{}`), a collection of unique keys equivalent to the mathematical set. *Modern CPython now uses a set implementation distinct from that of its dictionary, however they still behave much the same in terms of performance characteristics.*

Sets are used for eliminating duplicates and checking for membership, and will normally outperform lists especially when the list cannot be maintained sorted.

Expand Down
8 changes: 4 additions & 4 deletions episodes/profiling-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Function-level profiling analyses where time is being spent with respect to func
This allows functions that occupy a disproportionate amount of the total runtime to be quickly identified and investigated.

<!-- We will be covering -->
In this episode we will cover the usage of the function-level profiler `cProfile`, how it's output can be visualised with `snakeviz` and how the output can be interpreted.
In this episode we will cover the usage of the function-level profiler `cProfile`, how its output can be visualised with `snakeviz` and how the output can be interpreted.


::::::::::::::::::::::::::::::::::::: callout
Expand Down Expand Up @@ -124,7 +124,7 @@ python -m cProfile -o out.prof my_script.py input.csv
*No additional changes to your code are required, it's really that simple!*

<!-- TODO should the remainder of this section be in a call-out, it's unnecessary -->
If you instead, don't specify output to file (e.g. remove `-o out.prof` from the command), `cProfile` will produce output to console similar to that shown below:
If you don't specify output to file (e.g. remove `-o out.prof` from the command), `cProfile` will produce output to console similar to that shown below:

```output
28 function calls in 4.754 seconds
Expand Down Expand Up @@ -153,7 +153,7 @@ The columns have the following definitions:
| `percall` | The average tottime per function call (`tottime`/`ncalls`). |
| `cumtime` | The total time spent in the given function, including child function calls. |
| `percall` | The average cumtime per function call (`cumtime`/`ncalls`). |
| `filename:lineno(function)` | The location of the given function's definition and it's name. |
| `filename:lineno(function)` | The location of the given function's definition and its name. |

This output can often exceed the terminal's buffer length for large programs and can be unwieldy to parse, so the package `snakeviz` is often utilised to provide an interactive visualisation of the data when exported to file.

Expand Down Expand Up @@ -402,7 +402,7 @@ The value of `cities` should be a positive integer, this algorithm has poor scal

The hotspot only becomes visible when an argument of `5` or greater is passed.

You should see that `distance()` (from `travellingsales.py:11`) becomes the largest box (similarly it's parent in the call-stack `total_distance()`) showing that it scales poorly with the number of cities. With 5 cities, `distance()` has a cumulative time of `~35%` the runtime, this increases to `~60%` with 9 cities.
You should see that `distance()` (from `travellingsales.py:11`) becomes the largest box (similarly its parent in the call-stack `total_distance()`) showing that it scales poorly with the number of cities. With 5 cities, `distance()` has a cumulative time of `~35%` the runtime, this increases to `~60%` with 9 cities.

Other boxes within the diagram correspond to the initialisation of imports, or initialisation of cities. These have constant or linear scaling, so their cost barely increases with the number of cities.

Expand Down
6 changes: 3 additions & 3 deletions episodes/profiling-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ Profiling should be a relatively quick and inexpensive process. If there are no
<!-- Everyone benefits (why) -->
Even professional programmers make oversights that can lead to poor performance, and can be identified through profiling.

For example Grand Theft Auto Online, which has allegedly earned over $7bn since it's 2013 release, was notorious for it's slow loading times.
8 years after it's release [a 'hacker'](https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/) had enough, they reverse engineered and profiled the code to enable a 70% speedup!
For example Grand Theft Auto Online, which has allegedly earned over $7bn since its 2013 release, was notorious for slow loading times.
8 years after its release [a 'hacker'](https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times-by-70/) had enough, they reverse engineered and profiled the code to enable a 70% speedup!

*How much revenue did that unnecessary bottleneck cost, through user churn?*

Expand Down Expand Up @@ -130,7 +130,7 @@ Function-level profiling analyses where time is being spent with respect to func
This allows functions that occupy a disproportionate amount of the total runtime to be quickly identified and investigated.

<!-- We will be covering -->
In this course we will cover the usage of the function-level profiler `cProfile` and how it's output can be visualised with `snakeviz`.
In this course we will cover the usage of the function-level profiler `cProfile` and how its output can be visualised with `snakeviz`.

### Line-Level Profiling
<!-- Context -->
Expand Down
4 changes: 2 additions & 2 deletions episodes/profiling-lines.md
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ The `-r` argument passed to `kernprof` (or `line_profiler`) enables rich output,

If you're more familiar with writing Python inside Jupyter notebooks you can, as with `snakeviz`, use `line_profiler` directly from inside notebooks. However, it is still necessary for the code you wish to profile to be placed within a function.

First `line_profiler` must be installed and it's extension loaded.
First `line_profiler` must be installed and its extension loaded.

```py
!pip install line_profiler
Expand Down Expand Up @@ -290,7 +290,7 @@ Download and profile <a href="files/bubblesort/bubblesort.py" download>the Pytho

> Bubblesort is a basic sorting algorithm, it is not considered to be efficient so in practice other sorting algorithms are typically used.
>
> The array to be sorted is iterated, with a pair-wise sort being applied to each element and it's neighbour.
> The array to be sorted is iterated, with a pair-wise sort being applied to each element and its neighbour.
> This can cause elements to rise (or sink) multiple positions in a single pass, hence the name bubblesort.
> This iteration continues until the array is fully iterated with no elements being swapped.

Expand Down
2 changes: 1 addition & 1 deletion learners/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Array
: A data-structure that stores a collection of elements, typically of the same type, arranged in contiguous memory locations. Arrays enable elements to be iterated in order or directly accessed according to their index. Arrays do not support appending or removing items, Python's lists wrap an array to provide support for greater flexibility such as appending.

Benchmarking
: The process of running a program in order to assess it's overall performance. This can be useful to confirm the impact of optimisations.
: The process of running a program in order to assess its overall performance. This can be useful to confirm the impact of optimisations.

Bottleneck
: The component with the lowest throughput, which limits performance of the overall program. Improving the throughput of this component should improve the overall performance.
Expand Down
Loading