Appearance
B2 · Microarchitecture
Spec reference: Section B2 - Computer Architecture Key idea: Understand how a CPU processes instructions through fetch-decode-execute cycles, and the techniques used to increase performance.
The instruction cycle (Fetch-Decode-Execute)
The CPU processes instructions in a continuous cycle:
- Fetch: The Control Unit fetches the next instruction from memory at the address stored in the Program Counter (PC). The instruction is copied to the Instruction Register (IR). The PC is incremented to point to the next instruction.
- Decode: The Control Unit decodes the instruction to determine what operation is required and what data is needed.
- Execute: The ALU or other components carry out the instruction (e.g. add two numbers, write to memory, jump to a new address).
Execution speed
Factors affecting execution speed
| Factor | Effect |
|---|---|
| Clock speed (GHz) | More cycles per second means more instructions per second |
| Number of cores | More cores allow parallel processing of multiple threads |
| Cache size | Larger cache reduces fetching from slower RAM |
| Instruction set | RISC instructions execute faster; CISC can do more per instruction |
| Bus width | Wider buses transfer more data per cycle |
| RAM speed | Faster RAM reduces wait time when the CPU requests data |
Methods of increasing execution speed
- Pipelining: Overlapping stages of multiple instructions.
- Cache: Storing frequently used data closer to the CPU.
- Multi-core processors: Running multiple instruction streams simultaneously.
- Overclocking: Running the CPU above its rated clock speed (at the risk of instability and heat).
Instruction sets
An instruction set is the complete set of instructions a CPU can execute. Two main approaches:
| Feature | CISC (Complex Instruction Set Computer) | RISC (Reduced Instruction Set Computer) |
|---|---|---|
| Instructions | Many complex instructions | Few simple instructions |
| Execution | May take multiple clock cycles | Usually one clock cycle per instruction |
| Code size | Smaller programs (fewer instructions needed) | Larger programs (more instructions needed) |
| Hardware | Complex, more transistors | Simpler, fewer transistors |
| Examples | Intel x86 (PCs) | ARM (phones, tablets, Apple Silicon) |
Pipelining
Pipelining allows the CPU to work on multiple instructions at the same time by overlapping the stages of the fetch-decode-execute cycle.
Without pipelining:
Instruction 1: Fetch → Decode → Execute
Instruction 2: Fetch → Decode → ExecuteWith pipelining:
Instruction 1: Fetch → Decode → Execute
Instruction 2: Fetch → Decode → Execute
Instruction 3: Fetch → Decode → ExecuteEach stage of a different instruction runs at the same time. This significantly increases throughput.
Hazards that can disrupt pipelining:
- Data hazard: An instruction needs the result of a previous one that has not yet completed.
- Control hazard: A branch instruction changes the program flow, making prefetched instructions wrong.
Cache
Cache is ultra-fast memory built into or very close to the CPU. It stores copies of frequently accessed data and instructions so the CPU does not have to wait for slower RAM.
| Level | Speed | Size | Location |
|---|---|---|---|
| L1 | Fastest | Smallest (32-64 KB) | Inside the CPU core |
| L2 | Fast | Medium (256 KB to a few MB) | Inside or close to CPU core |
| L3 | Slower than L1/L2 | Largest (4-32 MB) | Shared across all cores |
When the CPU needs data, it checks L1 first, then L2, then L3, then RAM. A cache hit means the data was found in cache. A cache miss means it had to be fetched from RAM.
Multi-processing and multi-threading
Multi-processing
Using two or more CPUs (or cores) to execute multiple processes simultaneously. Each core handles a separate process.
Benefit: True parallel execution. Multiple tasks complete faster.
Multi-threading
A single core (or process) switches rapidly between multiple threads of execution. The OS schedules which thread runs at each moment.
Benefit: Better use of the CPU when threads are waiting (e.g. waiting for input/output).
| Feature | Multi-processing | Multi-threading |
|---|---|---|
| Hardware used | Multiple cores/CPUs | Single or multiple cores |
| Execution | True parallelism | Rapid switching (concurrency) |
| Memory | Separate memory per process | Shared memory between threads |
| Use case | Running multiple independent applications | Web server handling many requests |
CPU architecture types
Embedded and mobile CPU architecture
- Designed for low power consumption and small physical size.
- ARM architecture (RISC-based) is dominant in phones and tablets.
- May have limited cooling, so thermal management is critical.
- Battery life is prioritised over raw performance.
Microcomputer CPU architecture
- Balanced between performance and cost.
- Used in desktop PCs and laptops.
- Intel x86 and AMD64 are the dominant architectures.
- Designed for general-purpose tasks.
Server CPU architecture
- Designed for sustained heavy workloads.
- More cores (e.g. 32-128) to handle many simultaneous requests.
- Supports large amounts of ECC (Error Correcting Code) RAM.
- Optimised for reliability and uptime, not just peak speed.
- Examples: Intel Xeon, AMD EPYC.
Summary
| Term | Meaning |
|---|---|
| Fetch-Decode-Execute | The cycle by which a CPU processes instructions |
| Pipelining | Overlapping instruction stages to increase throughput |
| Cache | Fast memory holding recently used data near the CPU |
| RISC | Architecture with few, simple, fast instructions |
| CISC | Architecture with many complex instructions |
| Multi-threading | Rapid switching between threads on a single core |
| Cache hit | Required data found in cache, no RAM access needed |