1K LoC OS Overview
What We'll Implement
We plan to implement the following major features in a minimal operating system:
- Multitasking: Switch between processes to allow multiple applications to share the CPU.
- Exception handler: Handle events requiring OS intervention (e.g., illegal instructions).
- Paging: Provide an isolated memory address space for each application.
- System calls: Allow applications to call kernel functions.
- Device drivers: Abstract hardware functionalities (e.g., disk read/write).
- File system: Manage files on disk.
- Command-line shell: Provide a user interface for humans.
What We Won't Implement
We will not implement:
- Interrupt handling: We will use a polling method (busy waiting) instead of interrupts.
- Timer processing: Preemptive multitasking is not included; instead, we use cooperative multitasking (each process voluntarily yields the CPU).
- Inter-process communication: No pipes, UNIX domain sockets, or shared memory.
- Multi-processor support: Only a single processor is supported.
Development Environment
macOS Required Dev Tools
llvmlldqemu
Install with Homebrew:
brew install llvm lld qemuTIP
Check if clang supports 32-bit RISC-V CPU:
clang -print-targets | grep riscv32Source Code Structure
The final file structure will look like this:
├── disk/ # File system contents
├── common.c # Kernel/user common library: printf, memset, ...
├── common.h # Kernel/user common library: definitions of structs and constants
├── kernel.c # Kernel: process management, system calls, device drivers, file system
├── kernel.h # Kernel: definitions of structs and constants
├── kernel.ld # Kernel: linker script (memory layout definition)
├── shell.c # Command-line shell
├── user.c # User library: functions for system calls
├── user.h # User library: definitions of structs and constants
├── user.ld # User: linker script (memory layout definition)
└── run.sh # Build scriptRISC-V
An operating system abstracts away CPU differences, just as web browsers hide differences between Windows/macOS/Linux. We will write an OS for 32-bit RISC-V.
RISC-V Specification
- The RISC-V Instruction Set Manual Volume I: Unprivileged ISA
- The RISC-V Instruction Set Manual Volume II: Privileged Architecture
QEMU virt Machine
We will support the QEMU virt machine. Reference docs:
RISC-V Assembly 101
The RISC-V ISA (Instruction Set Architecture) defines the instructions that the CPU can execute. Below is a brief overview.
Assembly Language Basics
addi a0, a1, 123- Instruction:
addi(add immediate) - Operands:
a0,a1, and123 - This adds the value
123to the contents of registera1and stores the result in registera0.
Registers
Registers are fast, limited-memory "variables" inside the CPU. Some common RISC-V registers:
pc(program counter): Points to the next instructionx0(alias:zero): Always reads as zerox1(alias:ra): Return addressx2(alias:sp): Stack pointerx5-x7** (alias:t0-t2): Temporary registersx8(alias:fp): Stack frame pointerx10-x11(alias:a0-a1): Function arguments / return valuesx12-x17(alias:a2-a7): Function argumentsx18-x27(alias:s0-s11): Saved registersx28-x31(alias:t3-t6): Temporary registers
Memory Access
Because registers are limited, data is usually kept in memory. Instructions lw (load word) and sw (store word) are used to access memory.
lw a0, (a1) # Load a 32-bit word from address in a1 into a0
sw a0, (a1) # Store a 32-bit word from a0 to the address in a1Branch Instructions
Branch instructions (e.g., bnez, beq, blt) change the control flow:
bnez a0, <label> # If a0 != 0, jump to <label>
# Continue here if a0 == 0
<label>:
# If a0 != 0, execution continues hereFunction Calls
Use jal ("jump and link") and ret ("return"):
li a0, 123 # Load 123 into a0
jal ra, <label> # Jump to <label> and save return address in ra
# Execution continues here after function returns...
<label>:
addi a0, a0, 1 # a0 = a0 + 1
ret # Return to address in ra- Function arguments go in
a0–a7. - Return value is placed in
a0.
Stack
The stack is a LIFO memory space for function calls and local variables. It grows downward, and the sp (stack pointer) indicates the top.
Push operation (save value on stack):
addi sp, sp, -4 # Move sp down by 4 bytes
sw a0, (sp) # Store a0 on the stackPop operation (retrieve value from stack):
lw a0, (sp) # Load a0 from the stack
addi sp, sp, 4 # Move sp up by 4 bytesCPU Modes
- M-mode (machine mode): Used by OpenSBI (like BIOS)
- S-mode (supervisor mode): Used by the kernel
- U-mode (user mode): Used by user applications
Privileged Instructions
Privileged instructions are only available to supervisor or machine modes. Examples:
csrr rd, csr: Read a Control and Status Register (CSR) intord.csrw csr, rs: Writersto a CSR.csrrw rd, csr, rs: Read from CSR intord, then writersto CSR.sret: Return from trap handler.sfence.vma: Clear TLB (Translation Lookaside Buffer).
Inline Assembly
Sometimes we embed assembly in C code (inline assembly). For example:
uint32_t value;
__asm__ __volatile__ ("csrr %0, sepc" : "=r"(value));- This reads from the
sepcCSR into the C variablevalue.
Another example:
__asm__ __volatile__ ("csrw sscratch, %0" : : "r"(123));- This writes
123into thesscratchCSR.
How to Write Inline Assembly
Inline assembly in C takes the form:
__asm__ __volatile__ (
"assembly code"
: output_operands
: input_operands
: clobbered_registers
);__volatile__tells the compiler not to optimize out the assembly code.- Output operands (
"=r"(value)) and input operands ("r"(expr)) can be used to pass values in/out of assembly. %0,%1, ... refer to the operands in the order they appear.
TIP
Inline assembly is a compiler-specific extension (not part of standard C). For real-world examples, see:
Booting the Kernel
When a computer is turned on, the CPU initializes itself and starts executing the OS. OS initializes the hardware and starts the applications. This process is called "booting".
What happens before the OS starts? In PCs, BIOS (or UEFI in modern PCs) initializes the hardware, displays the splash screen and loads the OS from the disk. In QEMU 'virt' machine, OpenSBI is the equivalent of BIOS/UEFI.
Supervisor Binary Interface (SBI)
The Supervisor Binary Interface (SBI) is an API for OS kernels, but defines what the firmware (OpenSBI) provides to an OS.
The SBI specification is published on GitHub (https://github.com/riscv-non-isa/riscv-sbi-doc/releases). It defines useful features such as displaying characters on the debug console (e.g. serial port), reboot/shutdown and timer settings.
A famous SBI implementation is OpenSBI (https://github.com/riscv-software-src/opensbi). In QEMU, OpenSBI starts by default, performs hardware-specific initialization and boots the kernel.
Let's boot OpenSBI
QEMU takes various options to start the virtual machine. Here are the options used in the script:
-machine virt: Start a virt machine. You can check other supported machines with the -machine '?' option.
-bios default: Use the default firmware (OpenSBI in this case).
-nographic: Start QEUM without a GUI window.
-serial mon:stdio: Connect QEMU's standard I/O to the virtual machine's serial port. Specifying mon: allows switch to the QEMU monitor by pressing Ctrl+A, then C.
--no-reboot: If the virtual machine crashes, stop the emulator without rebooting (useful for debugging).OpenSBI displays the OpenSBI version, platform name, features, number of HARTs(CPU cores) and more for debugging purposes.
TIP
Press Ctrl+A then C to switch to the QEMU debug console (QEMU monitor). You can exit QEMU by q command in the monitor.
Linker script
A linker script is a file which defines the memory layout of executable files. Based on the layout, the linker assigns memory addresses to functions and variables.
Here are the key points of the linker script:
- The entry point of the kernel is the boot function.
- The base address is 0x80200000 (the default address for the QEMU virt machine).
- The .text.boot section is always placed at the beginning of the kernel.
- Each section is placed in the order of .text, .rodata, .data and .bss.
- The kernel stack comes after the .bss section and its size is 128KB.
.text, .rodata, .data and .bss sections mentioned above are data areas with specific roles:
- .text: contains the code of the program.
- .rodata: contains constant data that is read-only.
- .data: contains read/write data.
- .bss: contains read/write data with an initial value of zero.
Kernel debugging
Exception handling
Life of an exception
In RISC-V, an exception will be handled as follows:
- The CPU checks the medeleg register to determine which operation mode should handle the exception. OpenSBI has already configured to hande U-Mode/S-mode exceptions in S-Mode's handler.
- CPU saves its state (registers) into various CSRs.
- The value of the stvec register is set to the program counter, jumping to the kernel's exception handler.
- The exception hander saves general-purpose registers (i.e. the program state), and handles the exception.
- The exception handler restores the saved execution state and calls the sret instruction to resume execution from the point where the exception occured.
The CSRs updated in step 2 are mainly as follows. The kernel's exception determines necessary actions based on the CSRs:
- scause: type of exception. The kernel reads this to identify the type of exception.
- stval: additional information about the exception (e.g. memory address that caused the exception). Depends on the type of the exception.
- sepc: program counter at the point where the exception occurred.
- sstatus: operation mode (u-mode/s-mode) when the exception has occurred.
to check where the error occured from the memory address:
- error: PANIC: kernel.c:94: unexpected trap scause=00000002, stval=00000000, sepc=8020012e
- check: llvm-addr2line -e kernel.elf 8020015e