sccbioinformatics
diff --git a/‎docs/index.md‎ ‎docs/01-intro.md‎docs/index.md renamed to docs/01-intro.md
Lines changed: 12 additions & 8 deletions b/‎docs/index.md‎ ‎docs/01-intro.md‎docs/index.md renamed to docs/01-intro.md
Lines changed: 12 additions & 8 deletions
diff --git a/‎docs/02-basics.md‎
Lines changed: 155 additions & 15 deletions b/‎docs/02-basics.md‎
Lines changed: 155 additions & 15 deletions
@@ -40,28 +40,31 @@ After installation, open a terminal.
 Create an environment (one time):
 
 ```bash
-conda create -n quarto.env python=3.10
+conda create -n PythonCourse python=3.10
 ```
 
 Activate it:
 
 ```bash
-conda activate quarto.env
+conda activate PythonCourse
 ```
 
 From now on, whenever you work on this course, start by activating:
 
 ```bash
-conda activate quarto.env
+## windows only
+& "$HOME\miniconda3\shell\condabin\conda-hook.ps1"
+## all
+conda activate PythonCourse
 ```
 
 ---
 
 # Install the packages we’ll use
 
 ```bash
-pip install -c jupyterlab notebook nbclient ipykernel
-pip install -c numpy pandas matplotlib seaborn scipy statsmodels scikit-learn
+pip install  jupyterlab notebook nbclient ipykernel
+pip install  numpy pandas matplotlib seaborn scipy statsmodels scikit-learn
 ```
 
 Notes:
@@ -83,18 +86,19 @@ jupyter lab
 
 A browser window should open. Create a new notebook using the kernel:
 
-**Python (quarto.env)**
+**Python 3 (ipykernel)**
 
 ---
 
 # Choosing an editor (later)
 
-For notebooks, JupyterLab is great. For writing reusable code (scripts/packages), use an IDE:
+For notebooks / scripts, JupyterLab is great. For writing reusable packages, use an IDE:
 
 - **VS Code** (recommended): general purpose, very popular
 - **PyCharm**: powerful Python IDE, lots of features
 
-For this course: we’ll mostly use JupyterLab, and optionally VS Code later.
+For this course: we’ll mostly use JupyterLab and you can even stick to JupyterLab for the final project.
+But if you try to create a python package out of the final project (optional) I recommend you to use an IDE.
 
 ---
 
 
@@ -1,5 +1,5 @@
 ---
-title: "Python basics"
+title: "Python basics and NumPy vectors"
 ---
 
 # Goal of this section
@@ -10,6 +10,8 @@ By the end of this section you will be able to:
 - Understand the difference between Python lists and NumPy arrays
 - Add comments to your code
 - Use `help()` to look up documentation
+- Perform vectorized numerical operations
+- Use logical conditions to select values
 
 These are the building blocks for everything that follows.
 
@@ -32,12 +34,21 @@ print(x)
 print(name)
 ```
 
-In Jupyter notebooks, simply writing the variable name will also show its value:
+In Jupyter notebooks, simply writing the variable name as the last statement of the cell will also show its value:
 
 ```python
 x
 ```
 
+What would happen if you have another statement after the variable name?
+
+Try:
+
+```python
+x
+a = 14
+```
+
 ---
 
 # Comments
@@ -65,8 +76,8 @@ genes
 You can access elements using square brackets `[]`:
 
 ```python
-genes[0]
-genes[1]
+print(genes[0])
+print(genes[1])
 ```
 
 Python starts counting at **0**, not 1.
@@ -84,6 +95,20 @@ First import NumPy:
 import numpy as np
 ```
 
+This means:
+- Import the `numpy` library
+- Make it available as `np`
+
+Later we can call functions like:
+
+```python
+np.array(x)
+```
+
+---
+
+# Creating arrays
+
 Create a numeric vector:
 
 ```python
@@ -124,7 +149,9 @@ arr + 1
 ```
 
 This adds 1 to every element.  
-This is called **vectorized computation** and is extremely important for performance.
+This is called **vectorized computation**.
+
+It is much faster than Python loops because the operations are executed in optimized compiled code (mostly written in C) and work on entire blocks of memory at once.
 
 ---
 
@@ -157,13 +184,7 @@ np.linspace(0, 100, 11)
 Python has built-in documentation.
 
 ```python
-help(np.arange)
-```
-
-In Jupyter you can also use:
-
-```python
-np.arange?
+help(np.arange)   # or in Jupyter: ?np.arange
 ```
 
 Look at:
@@ -172,7 +193,7 @@ Look at:
 
 ---
 
-# Simple calculations
+# Vectorized mathematics
 
 Create a vector:
 
@@ -210,6 +231,119 @@ np.std(v)
 
 ---
 
+## Mathematical background
+
+For a vector with values:
+
+\[
+x_1, x_2, x_3, \dots, x_n
+\]
+
+### Mean
+
+The **mean** (average) is:
+
+\[
+\mu = \frac{1}{n} \sum_{i=1}^{n} x_i
+\]
+
+This means:
+- Add up all values
+- Divide by the number of values
+
+---
+
+### Standard deviation
+
+The **standard deviation** measures how much the values vary around the mean.
+
+Population standard deviation:
+
+\[
+\sigma = \sqrt{
+\frac{1}{n}
+\sum_{i=1}^{n} (x_i - \mu)^2
+}
+\]
+
+Steps:
+1. Subtract the mean from each value
+2. Square the differences
+3. Compute the mean of those squared differences
+4. Take the square root
+
+---
+
+# Logical comparisons
+
+We can compare vectors to values:
+
+```python
+x = np.arange(1, 11)
+
+print(x == 5)
+print(x < 5)
+print(x >= 5)
+x != 5
+```
+
+These comparisons return **boolean arrays** (True / False for each element).
+
+---
+
+# Using logical results to select values
+
+We can use boolean arrays to subset data:
+
+```python
+x[x < 5]
+x[x >= 5]
+x[x != 5]
+```
+
+This is a very common pattern in data analysis. But would that also work with a list? Try it!
+
+
+---
+
+
+# Working with two string vectors
+
+Create some gene vectors:
+
+```python
+marker_genes = np.array(["Gata1", "Spi1", "Runx1"])
+detected_genes = np.array(["Runx1", "Actb", "Gata1"])
+```
+
+Find common values:
+
+```python
+common = np.intersect1d(marker_genes, detected_genes)
+common
+```
+
+Find values in `marker_genes` that are NOT in `detected_genes`:
+
+```python
+result = marker_genes[~np.isin(marker_genes, detected_genes)]
+result
+```
+
+This would once again not work with Python lists.
+
+---
+
+# Exercise
+
+1. Create a vector from 10 to 50 in steps of 2.
+2. Select only values larger than 25.
+3. Create a second vector from 30 to 70.
+4. Find the values common to both vectors.
+5. Find the values in the first vector that are not in the second.
+
+---
+
 # Why this matters for bioinformatics
 
 Gene expression data is usually stored as:
@@ -219,8 +353,14 @@ Gene expression data is usually stored as:
 
 These are large numerical tables.  
 NumPy arrays allow us to:
+
 - store them efficiently
 - compute statistics quickly
-- prepare them for plotting and clustering
+- filter genes and samples
+- find overlaps between gene sets
+- prepare data for plotting and clustering
+
+These operations are performed thousands of times in real workflows.
+Using NumPy vectorized operations and logical indexing is the fastest and clearest way to do this.
 
-In the next section, we will learn how to make plots from these vectors.
+In the next section, we will learn how to plot data in python.