|
1 | 1 | # Process & Scheduler Internals |
2 | 2 |
|
3 | 3 | ## 1. Overview |
4 | | -The Kernel employs a preemptive, multi-class scheduler supporting SMP (Symmetric Multiprocessing). It handles task switching, CPU load balancing, and real-time constraints. |
| 4 | +The Kernel employs a preemptive, multi-class scheduler supporting SMP (Symmetric Multiprocessing). It handles task switching, CPU load balancing, and real-time constraints using a modular scheduling class architecture. |
5 | 5 |
|
6 | 6 | ## 2. Process Model (`process_t`) |
7 | 7 | The `process_t` structure is the core entity, containing the execution context, memory map, and resource limits. |
8 | 8 |
|
9 | | -### Key Fields |
10 | | -- **Context**: `esp`, `eip`, `eflags`, general registers. |
11 | | -- **Memory**: `page_directory` (CR3) and `vm_region_t` array for VMM. |
12 | | -- **Scheduling**: `priority`, `time_slice`, `vruntime` (CFS), `rt_budget` (Real-Time). |
13 | | -- **Identity**: `pid`, `uid`, `gid`, `caps` (Capabilities). |
14 | | -- **Resources**: File descriptors (`fds`), Signals (`pending_signals`). |
| 9 | +### Data Structures |
| 10 | +```c |
| 11 | +// Core Process Control Block |
| 12 | +typedef struct process { |
| 13 | + uint32_t pid; // Process ID |
| 14 | + uint32_t state; // RUNNING, SLEEPING, ZOMBIE |
| 15 | + uint32_t flags; // KERNEL_THREAD, REALTIME |
| 16 | + |
| 17 | + // Execution Context |
| 18 | + regs_t context; // Saved registers |
| 19 | + uintptr_t kernel_stack; // Kernel stack pointer (ESP0) |
| 20 | + page_directory_t* cr3; // Page Directory (Physical Address) |
| 21 | + |
| 22 | + // Scheduling |
| 23 | + struct sched_entity se; // Scheduling entity info |
| 24 | + struct sched_class* class; // Pointer to handling class |
| 25 | + |
| 26 | + // Links |
| 27 | + struct process* parent; |
| 28 | + struct list_head children; |
| 29 | + struct list_head siblings; |
| 30 | + |
| 31 | + // Resources |
| 32 | + file_descriptor_t* fds[MAX_FDS]; |
| 33 | + signal_state_t signals; |
| 34 | +} process_t; |
| 35 | + |
| 36 | +// Scheduling Entity (Generic) |
| 37 | +struct sched_entity { |
| 38 | + uint32_t priority; // Static priority |
| 39 | + uint64_t vruntime; // Virtual Runtime (CFS) |
| 40 | + uint64_t time_slice; // Remaining slice (RR) |
| 41 | + uint64_t deadline; // Absolute deadline (EDF) |
| 42 | + struct list_head run_list; // Link to runqueue |
| 43 | +}; |
| 44 | +``` |
15 | 45 |
|
16 | 46 | ### Lifecycle Diagram |
17 | 47 | ```ascii |
18 | | -+---------+ fork() +-------+ exec() +---------+ |
19 | | -| Parent | -------------> | Child | -------------> | Running | |
20 | | -+---------+ +-------+ +---------+ |
21 | | - | | |
22 | | - | wait() | exit() |
23 | | - v v |
24 | | - +---------+ +---------+ |
25 | | - | Running | <------------ | Zombie | |
26 | | - +---------+ +---------+ |
| 48 | + fork() scheduler_pick() |
| 49 | + +---> [ NEW ] ---------------------> [ READY ] <----+ |
| 50 | + | ^ | | |
| 51 | +Create | | schedule() |
| 52 | + | | v | preempt() |
| 53 | + +---------------------------------- [ RUNNING ] ----+ |
| 54 | + | |
| 55 | + | block() / wait() |
| 56 | + v |
| 57 | + [ BLOCKED ] |
27 | 58 | ``` |
28 | 59 |
|
29 | | -### Lifecycle Steps |
30 | | -1. **Creation**: `process_create` allocates the structure and a new Page Directory. |
31 | | -2. **Fork**: `process_fork` performs a copy-on-write clone of the parent. |
32 | | -3. **Exec**: `process_exec` replaces the memory image with an ELF binary. |
33 | | -4. **Exit**: `process_exit` releases resources but keeps the structure as a zombie until `process_wait`. |
| 60 | +## 3. Scheduling Classes Implementation |
| 61 | +The scheduler uses a virtual function table (VTable) approach to support multiple scheduling policies. |
34 | 62 |
|
35 | | -## 3. Scheduling Classes |
36 | | -The scheduler supports multiple classes, prioritized in order: |
| 63 | +### Scheduler Class Interface |
| 64 | +```c |
| 65 | +struct sched_class { |
| 66 | + const char* name; |
| 67 | + |
| 68 | + // Core API |
| 69 | + void (*enqueue_task)(struct runqueue* rq, process_t* p); |
| 70 | + void (*dequeue_task)(struct runqueue* rq, process_t* p); |
| 71 | + void (*yield_task)(struct runqueue* rq); |
| 72 | + |
| 73 | + // Selection |
| 74 | + process_t* (*pick_next_task)(struct runqueue* rq); |
| 75 | + |
| 76 | + // Events |
| 77 | + void (*task_tick)(struct runqueue* rq, process_t* p); |
| 78 | + void (*task_fork)(process_t* p); |
| 79 | +}; |
| 80 | +``` |
37 | 81 |
|
38 | 82 | ### Priority Matrix |
39 | | -| Class ID | Name | Priority Range | Algorithm | Use Case | |
| 83 | +| Class ID | Name | Priority | Algorithm | Implementation | |
40 | 84 | | :--- | :--- | :--- | :--- | :--- | |
41 | | -| 0 | `SCHED_CLASS_DEADLINE` | N/A (Earliest Deadline) | EDF | Hard Real-Time (Audio, Control) | |
42 | | -| 1 | `SCHED_CLASS_RT` | 0-99 | FIFO/RR | Soft Real-Time (UI, Input) | |
43 | | -| 2 | `SCHED_CLASS_CFS` | 100-139 | Red-Black Tree | Normal User Tasks | |
44 | | -| 3 | `SCHED_CLASS_IDLE` | 140 | Round-Robin | System Idle Loop | |
45 | | - |
46 | | -### A. Deadline (`SCHED_CLASS_DEADLINE`) |
47 | | -- **Algorithm**: Earliest Deadline First (EDF). |
48 | | -- **Parameters**: `runtime`, `period`, `deadline`. |
49 | | - |
50 | | -### B. Real-Time (`SCHED_CLASS_RT`) |
51 | | -- **Algorithm**: Fixed Priority Preemptive. |
52 | | -- **Logic**: Higher priority tasks always preempt lower ones. Round-robin for equal priority. |
53 | | - |
54 | | -### C. CFS (Completely Fair Scheduler - `SCHED_CLASS_CFS`) |
55 | | -- **Algorithm**: Red-Black Tree (or sorted list) based on `vruntime`. |
56 | | -- **Logic**: Tasks with the lowest virtual runtime are picked next. `vruntime` increases as the task runs, weighted by priority. |
57 | | - |
58 | | -## 4. SMP (Symmetric Multiprocessing) |
59 | | -The `smp_rally` subsystem handles multicore initialization and management. |
60 | | - |
61 | | -### Architecture |
62 | | -- **BSP (Bootstrap Processor)**: Boots the system and wakes up APs (Application Processors). |
63 | | -- **AP Initialization**: |
64 | | - 1. BSP sends INIT IPI. |
65 | | - 2. BSP sends SIPI (Start-up IPI) with a trampoline vector. |
66 | | - 3. AP jumps to protected mode and enables paging. |
67 | | - 4. AP calls `scheduler_init` for its local queue. |
68 | | - |
69 | | -### Load Balancing |
70 | | -- **Affinity**: `cpu_mask` restricts which CPUs a process can run on. |
71 | | -- **Migration**: `scheduler_balance_load` moves tasks from busy queues to idle queues. |
72 | | - |
73 | | -## 5. Context Switching |
74 | | -The low-level mechanism (`switch_task`, `context_switch`) saves and restores state. |
75 | | - |
76 | | -### Flow |
77 | | -1. **Interrupt**: Timer or System Call triggers the scheduler. |
78 | | -2. **Save**: Current registers are pushed to the kernel stack. |
79 | | -3. **Pick**: The scheduler selects the next process. |
80 | | -4. **Switch**: |
81 | | - - `CR3` is updated (TLB flush). |
82 | | - - Kernel stack pointer (`ESP0` in TSS) is updated. |
83 | | - - Registers are popped from the new task's stack. |
84 | | -5. **Return**: `IRET` jumps to the new task's `EIP`. |
85 | | - |
86 | | -## 6. Configuration & Tuning |
87 | | -System parameters can be adjusted in `sched_config.h`: |
| 85 | +| 0 | `SCHED_DEADLINE` | Top | EDF | Red-Black Tree by Deadline | |
| 86 | +| 1 | `SCHED_RT` | High | FIFO/RR | Array of Linked Lists (O(1)) | |
| 87 | +| 2 | `SCHED_CFS` | Normal | CFS | Red-Black Tree by vruntime | |
| 88 | +| 3 | `SCHED_IDLE` | Low | Round-Robin | Single Linked List | |
| 89 | + |
| 90 | +## 4. Core Scheduler Implementation |
| 91 | + |
| 92 | +### The Runqueue (`runqueue_t`) |
| 93 | +Each CPU has its own runqueue to minimize lock contention. |
| 94 | + |
| 95 | +```c |
| 96 | +typedef struct runqueue { |
| 97 | + spinlock_t lock; |
| 98 | + uint32_t nr_running; |
| 99 | + process_t* curr; |
| 100 | + |
| 101 | + // Class-specific sub-queues |
| 102 | + struct rb_root cfs_root; // CFS Red-Black Tree |
| 103 | + struct list_head rt_queues[100];// Real-Time Priority Arrays |
| 104 | + struct list_head dl_root; // Deadline Tree |
| 105 | + |
| 106 | + uint64_t clock; // Monotonic clock |
| 107 | +} runqueue_t; |
| 108 | +``` |
| 109 | + |
| 110 | +### The `schedule()` Function |
| 111 | +This is the main entry point, called by interrupt handlers or voluntary yield. |
| 112 | + |
| 113 | +```c |
| 114 | +void schedule(void) { |
| 115 | + int cpu = get_cpu_id(); |
| 116 | + runqueue_t* rq = &runqueues[cpu]; |
| 117 | + |
| 118 | + spin_lock(&rq->lock); |
| 119 | + |
| 120 | + process_t* prev = rq->curr; |
| 121 | + |
| 122 | + // 1. Check for stack overflow (Debug) |
| 123 | + check_stack_canary(prev); |
| 124 | + |
| 125 | + // 2. Put previous task back to queue (if still runnable) |
| 126 | + if (prev->state == TASK_RUNNING) { |
| 127 | + prev->class->enqueue_task(rq, prev); |
| 128 | + } |
| 129 | + |
| 130 | + // 3. Pick next task |
| 131 | + process_t* next = pick_next_task(rq); |
| 132 | + |
| 133 | + // 4. Context Switch |
| 134 | + if (prev != next) { |
| 135 | + rq->curr = next; |
| 136 | + context_switch(prev, next); |
| 137 | + } |
| 138 | + |
| 139 | + spin_unlock(&rq->lock); |
| 140 | +} |
| 141 | +``` |
| 142 | +
|
| 143 | +## 5. Context Switch Mechanism |
| 144 | +The low-level context switch is architecture-specific (x86 example). It must save callee-saved registers and switch the stack pointer. |
| 145 | +
|
| 146 | +### Assembly Implementation (`switch.S`) |
| 147 | +```nasm |
| 148 | +; void switch_to(process_t* prev, process_t* next); |
| 149 | +global switch_to |
| 150 | +switch_to: |
| 151 | + ; [ESP+4] = prev |
| 152 | + ; [ESP+8] = next |
| 153 | + |
| 154 | + mov eax, [esp+4] ; EAX = prev |
| 155 | + mov edx, [esp+8] ; EDX = next |
| 156 | + |
| 157 | + ; 1. Save Previous Context |
| 158 | + push ebp |
| 159 | + push ebx |
| 160 | + push esi |
| 161 | + push edi |
| 162 | + |
| 163 | + ; Save ESP to prev->thread.esp |
| 164 | + mov [eax + OFFSET_ESP], esp |
| 165 | + |
| 166 | + ; 2. Restore Next Context |
| 167 | + ; Load ESP from next->thread.esp |
| 168 | + mov esp, [edx + OFFSET_ESP] |
| 169 | + |
| 170 | + ; 3. Load Page Directory (if different) |
| 171 | + mov ecx, [edx + OFFSET_CR3] |
| 172 | + mov ebx, cr3 |
| 173 | + cmp ecx, ebx |
| 174 | + je .same_pd |
| 175 | + mov cr3, ecx |
| 176 | +.same_pd: |
| 177 | + |
| 178 | + ; 4. Restore Registers |
| 179 | + pop edi |
| 180 | + pop esi |
| 181 | + pop ebx |
| 182 | + pop ebp |
| 183 | + |
| 184 | + ret |
| 185 | +``` |
| 186 | + |
| 187 | +## 6. Tick Handler Integration |
| 188 | +The timer interrupt calls `scheduler_tick()` to update runtime statistics and trigger preemption. |
88 | 189 |
|
89 | 190 | ```c |
90 | | -#define SCHED_TICK_HZ 1000 // Timer interrupt frequency |
91 | | -#define MIN_GRANULARITY_NS 100000 // Min slice for CFS |
92 | | -#define LOAD_BALANCE_INTERVAL 50 // Balance every 50 ticks |
| 191 | +void scheduler_tick(void) { |
| 192 | + int cpu = get_cpu_id(); |
| 193 | + runqueue_t* rq = &runqueues[cpu]; |
| 194 | + process_t* curr = rq->curr; |
| 195 | + |
| 196 | + spin_lock(&rq->lock); |
| 197 | + |
| 198 | + // Update generic stats |
| 199 | + rq->clock++; |
| 200 | + |
| 201 | + // Class-specific tick handling |
| 202 | + // e.g., CFS updates vruntime, RT decrements time_slice |
| 203 | + curr->class->task_tick(rq, curr); |
| 204 | + |
| 205 | + spin_unlock(&rq->lock); |
| 206 | +} |
93 | 207 | ``` |
94 | 208 |
|
95 | | -## API Reference |
96 | | -- `scheduler_yield()`: Voluntarily give up CPU. |
97 | | -- `scheduler_set_affinity(pid, mask)`: Bind process to specific cores. |
98 | | -- `scheduler_set_deadline(...)`: Promote a task to Deadline class. |
99 | | -- `process_fork(parent)`: Create a child process. |
| 209 | +## 7. SMP Load Balancing |
| 210 | +The scheduler periodically balances load across CPUs using work stealing. |
| 211 | +
|
| 212 | +### Migration Logic |
| 213 | +1. **Trigger**: Timer interrupt (every 200ms) or when a CPU goes idle. |
| 214 | +2. **Find Busiest**: Scan other CPUs' runqueues to find the one with the highest load. |
| 215 | +3. **Steal**: Move tasks from the busiest runqueue to the current one. |
| 216 | +4. **Affinity Check**: Ensure the task is allowed to run on the current CPU (`p->cpu_mask`). |
| 217 | +
|
| 218 | +## 8. Configuration (`sched_config.h`) |
| 219 | +Tunable parameters for system performance. |
| 220 | +
|
| 221 | +```c |
| 222 | +#ifndef SCHED_CONFIG_H |
| 223 | +#define SCHED_CONFIG_H |
| 224 | +
|
| 225 | +// Time Slices |
| 226 | +#define BASE_TIME_SLICE_MS 100 |
| 227 | +#define MIN_TIME_SLICE_MS 10 |
| 228 | +
|
| 229 | +// Priorities |
| 230 | +#define MAX_PRIO 140 |
| 231 | +#define MAX_RT_PRIO 100 |
| 232 | +
|
| 233 | +// CFS Parameters |
| 234 | +#define CFS_LATENCY_TARGET_NS 20000000ULL // 20ms |
| 235 | +#define CFS_MIN_GRANULARITY 4000000ULL // 4ms |
| 236 | +
|
| 237 | +// Load Balancing |
| 238 | +#define BALANCE_INTERVAL_MS 200 |
| 239 | +#define MIGRATION_COST_NS 500000ULL |
| 240 | +
|
| 241 | +#endif |
| 242 | +``` |
0 commit comments