Skip to content

Commit e2f1668

Browse files
committed
Logical break at IO -> performance
1 parent c907053 commit e2f1668

19 files changed

Lines changed: 1164 additions & 1016 deletions

docs/index.md renamed to docs/01-intro.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -40,28 +40,31 @@ After installation, open a terminal.
4040
Create an environment (one time):
4141

4242
```bash
43-
conda create -n quarto.env python=3.10
43+
conda create -n PythonCourse python=3.10
4444
```
4545

4646
Activate it:
4747

4848
```bash
49-
conda activate quarto.env
49+
conda activate PythonCourse
5050
```
5151

5252
From now on, whenever you work on this course, start by activating:
5353

5454
```bash
55-
conda activate quarto.env
55+
## windows only
56+
& "$HOME\miniconda3\shell\condabin\conda-hook.ps1"
57+
## all
58+
conda activate PythonCourse
5659
```
5760

5861
---
5962

6063
# Install the packages we’ll use
6164

6265
```bash
63-
pip install -c jupyterlab notebook nbclient ipykernel
64-
pip install -c numpy pandas matplotlib seaborn scipy statsmodels scikit-learn
66+
pip install jupyterlab notebook nbclient ipykernel
67+
pip install numpy pandas matplotlib seaborn scipy statsmodels scikit-learn
6568
```
6669

6770
Notes:
@@ -83,18 +86,19 @@ jupyter lab
8386

8487
A browser window should open. Create a new notebook using the kernel:
8588

86-
**Python (quarto.env)**
89+
**Python 3 (ipykernel)**
8790

8891
---
8992

9093
# Choosing an editor (later)
9194

92-
For notebooks, JupyterLab is great. For writing reusable code (scripts/packages), use an IDE:
95+
For notebooks / scripts, JupyterLab is great. For writing reusable packages, use an IDE:
9396

9497
- **VS Code** (recommended): general purpose, very popular
9598
- **PyCharm**: powerful Python IDE, lots of features
9699

97-
For this course: we’ll mostly use JupyterLab, and optionally VS Code later.
100+
For this course: we’ll mostly use JupyterLab and you can even stick to JupyterLab for the final project.
101+
But if you try to create a python package out of the final project (optional) I recommend you to use an IDE.
98102

99103
---
100104

docs/02-basics.md

Lines changed: 155 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Python basics"
2+
title: "Python basics and NumPy vectors"
33
---
44

55
# Goal of this section
@@ -10,6 +10,8 @@ By the end of this section you will be able to:
1010
- Understand the difference between Python lists and NumPy arrays
1111
- Add comments to your code
1212
- Use `help()` to look up documentation
13+
- Perform vectorized numerical operations
14+
- Use logical conditions to select values
1315

1416
These are the building blocks for everything that follows.
1517

@@ -32,12 +34,21 @@ print(x)
3234
print(name)
3335
```
3436

35-
In Jupyter notebooks, simply writing the variable name will also show its value:
37+
In Jupyter notebooks, simply writing the variable name as the last statement of the cell will also show its value:
3638

3739
```python
3840
x
3941
```
4042

43+
What would happen if you have another statement after the variable name?
44+
45+
Try:
46+
47+
```python
48+
x
49+
a = 14
50+
```
51+
4152
---
4253

4354
# Comments
@@ -65,8 +76,8 @@ genes
6576
You can access elements using square brackets `[]`:
6677

6778
```python
68-
genes[0]
69-
genes[1]
79+
print(genes[0])
80+
print(genes[1])
7081
```
7182

7283
Python starts counting at **0**, not 1.
@@ -84,6 +95,20 @@ First import NumPy:
8495
import numpy as np
8596
```
8697

98+
This means:
99+
- Import the `numpy` library
100+
- Make it available as `np`
101+
102+
Later we can call functions like:
103+
104+
```python
105+
np.array(x)
106+
```
107+
108+
---
109+
110+
# Creating arrays
111+
87112
Create a numeric vector:
88113

89114
```python
@@ -124,7 +149,9 @@ arr + 1
124149
```
125150

126151
This adds 1 to every element.
127-
This is called **vectorized computation** and is extremely important for performance.
152+
This is called **vectorized computation**.
153+
154+
It is much faster than Python loops because the operations are executed in optimized compiled code (mostly written in C) and work on entire blocks of memory at once.
128155

129156
---
130157

@@ -157,13 +184,7 @@ np.linspace(0, 100, 11)
157184
Python has built-in documentation.
158185

159186
```python
160-
help(np.arange)
161-
```
162-
163-
In Jupyter you can also use:
164-
165-
```python
166-
np.arange?
187+
help(np.arange) # or in Jupyter: ?np.arange
167188
```
168189

169190
Look at:
@@ -172,7 +193,7 @@ Look at:
172193

173194
---
174195

175-
# Simple calculations
196+
# Vectorized mathematics
176197

177198
Create a vector:
178199

@@ -210,6 +231,119 @@ np.std(v)
210231

211232
---
212233

234+
## Mathematical background
235+
236+
For a vector with values:
237+
238+
\[
239+
x_1, x_2, x_3, \dots, x_n
240+
\]
241+
242+
### Mean
243+
244+
The **mean** (average) is:
245+
246+
\[
247+
\mu = \frac{1}{n} \sum_{i=1}^{n} x_i
248+
\]
249+
250+
This means:
251+
- Add up all values
252+
- Divide by the number of values
253+
254+
---
255+
256+
### Standard deviation
257+
258+
The **standard deviation** measures how much the values vary around the mean.
259+
260+
Population standard deviation:
261+
262+
\[
263+
\sigma = \sqrt{
264+
\frac{1}{n}
265+
\sum_{i=1}^{n} (x_i - \mu)^2
266+
}
267+
\]
268+
269+
Steps:
270+
1. Subtract the mean from each value
271+
2. Square the differences
272+
3. Compute the mean of those squared differences
273+
4. Take the square root
274+
275+
---
276+
277+
# Logical comparisons
278+
279+
We can compare vectors to values:
280+
281+
```python
282+
x = np.arange(1, 11)
283+
284+
print(x == 5)
285+
print(x < 5)
286+
print(x >= 5)
287+
x != 5
288+
```
289+
290+
These comparisons return **boolean arrays** (True / False for each element).
291+
292+
---
293+
294+
# Using logical results to select values
295+
296+
We can use boolean arrays to subset data:
297+
298+
```python
299+
x[x < 5]
300+
x[x >= 5]
301+
x[x != 5]
302+
```
303+
304+
This is a very common pattern in data analysis. But would that also work with a list? Try it!
305+
306+
307+
---
308+
309+
310+
# Working with two string vectors
311+
312+
Create some gene vectors:
313+
314+
```python
315+
marker_genes = np.array(["Gata1", "Spi1", "Runx1"])
316+
detected_genes = np.array(["Runx1", "Actb", "Gata1"])
317+
```
318+
319+
Find common values:
320+
321+
```python
322+
common = np.intersect1d(marker_genes, detected_genes)
323+
common
324+
```
325+
326+
Find values in `marker_genes` that are NOT in `detected_genes`:
327+
328+
```python
329+
result = marker_genes[~np.isin(marker_genes, detected_genes)]
330+
result
331+
```
332+
333+
This would once again not work with Python lists.
334+
335+
---
336+
337+
# Exercise
338+
339+
1. Create a vector from 10 to 50 in steps of 2.
340+
2. Select only values larger than 25.
341+
3. Create a second vector from 30 to 70.
342+
4. Find the values common to both vectors.
343+
5. Find the values in the first vector that are not in the second.
344+
345+
---
346+
213347
# Why this matters for bioinformatics
214348

215349
Gene expression data is usually stored as:
@@ -219,8 +353,14 @@ Gene expression data is usually stored as:
219353

220354
These are large numerical tables.
221355
NumPy arrays allow us to:
356+
222357
- store them efficiently
223358
- compute statistics quickly
224-
- prepare them for plotting and clustering
359+
- filter genes and samples
360+
- find overlaps between gene sets
361+
- prepare data for plotting and clustering
362+
363+
These operations are performed thousands of times in real workflows.
364+
Using NumPy vectorized operations and logical indexing is the fastest and clearest way to do this.
225365

226-
In the next section, we will learn how to make plots from these vectors.
366+
In the next section, we will learn how to plot data in python.

0 commit comments

Comments
 (0)