B2 · Microarchitecture

Spec reference: Section B2 - Computer Architecture Key idea: Understand how a CPU processes instructions through fetch-decode-execute cycles, and the techniques used to increase performance.

The instruction cycle (Fetch-Decode-Execute)

The CPU processes instructions in a continuous cycle:

Fetch: The Control Unit fetches the next instruction from memory at the address stored in the Program Counter (PC). The instruction is copied to the Instruction Register (IR). The PC is incremented to point to the next instruction.
Decode: The Control Unit decodes the instruction to determine what operation is required and what data is needed.
Execute: The ALU or other components carry out the instruction (e.g. add two numbers, write to memory, jump to a new address).

Execution speed

Factors affecting execution speed

Factor	Effect
Clock speed (GHz)	More cycles per second means more instructions per second
Number of cores	More cores allow parallel processing of multiple threads
Cache size	Larger cache reduces fetching from slower RAM
Instruction set	RISC instructions execute faster; CISC can do more per instruction
Bus width	Wider buses transfer more data per cycle
RAM speed	Faster RAM reduces wait time when the CPU requests data

Methods of increasing execution speed

Pipelining: Overlapping stages of multiple instructions.
Cache: Storing frequently used data closer to the CPU.
Multi-core processors: Running multiple instruction streams simultaneously.
Overclocking: Running the CPU above its rated clock speed (at the risk of instability and heat).

Instruction sets

An instruction set is the complete set of instructions a CPU can execute. Two main approaches:

Feature	CISC (Complex Instruction Set Computer)	RISC (Reduced Instruction Set Computer)
Instructions	Many complex instructions	Few simple instructions
Execution	May take multiple clock cycles	Usually one clock cycle per instruction
Code size	Smaller programs (fewer instructions needed)	Larger programs (more instructions needed)
Hardware	Complex, more transistors	Simpler, fewer transistors
Examples	Intel x86 (PCs)	ARM (phones, tablets, Apple Silicon)

Pipelining

Pipelining allows the CPU to work on multiple instructions at the same time by overlapping the stages of the fetch-decode-execute cycle.

Without pipelining:

Instruction 1: Fetch → Decode → Execute
Instruction 2:                           Fetch → Decode → Execute

With pipelining:

Instruction 1: Fetch → Decode → Execute
Instruction 2:         Fetch  → Decode → Execute
Instruction 3:                   Fetch → Decode → Execute

Each stage of a different instruction runs at the same time. This significantly increases throughput.

Hazards that can disrupt pipelining:

Data hazard: An instruction needs the result of a previous one that has not yet completed.
Control hazard: A branch instruction changes the program flow, making prefetched instructions wrong.

Cache

Cache is ultra-fast memory built into or very close to the CPU. It stores copies of frequently accessed data and instructions so the CPU does not have to wait for slower RAM.

Level	Speed	Size	Location
L1	Fastest	Smallest (32-64 KB)	Inside the CPU core
L2	Fast	Medium (256 KB to a few MB)	Inside or close to CPU core
L3	Slower than L1/L2	Largest (4-32 MB)	Shared across all cores

When the CPU needs data, it checks L1 first, then L2, then L3, then RAM. A cache hit means the data was found in cache. A cache miss means it had to be fetched from RAM.

Multi-processing and multi-threading

Multi-processing

Using two or more CPUs (or cores) to execute multiple processes simultaneously. Each core handles a separate process.

Benefit: True parallel execution. Multiple tasks complete faster.

Multi-threading

A single core (or process) switches rapidly between multiple threads of execution. The OS schedules which thread runs at each moment.

Benefit: Better use of the CPU when threads are waiting (e.g. waiting for input/output).

Feature	Multi-processing	Multi-threading
Hardware used	Multiple cores/CPUs	Single or multiple cores
Execution	True parallelism	Rapid switching (concurrency)
Memory	Separate memory per process	Shared memory between threads
Use case	Running multiple independent applications	Web server handling many requests

CPU architecture types

Embedded and mobile CPU architecture

Designed for low power consumption and small physical size.
ARM architecture (RISC-based) is dominant in phones and tablets.
May have limited cooling, so thermal management is critical.
Battery life is prioritised over raw performance.

Microcomputer CPU architecture

Balanced between performance and cost.
Used in desktop PCs and laptops.
Intel x86 and AMD64 are the dominant architectures.
Designed for general-purpose tasks.

Server CPU architecture

Designed for sustained heavy workloads.
More cores (e.g. 32-128) to handle many simultaneous requests.
Supports large amounts of ECC (Error Correcting Code) RAM.
Optimised for reliability and uptime, not just peak speed.
Examples: Intel Xeon, AMD EPYC.

Summary

Term	Meaning
Fetch-Decode-Execute	The cycle by which a CPU processes instructions
Pipelining	Overlapping instruction stages to increase throughput
Cache	Fast memory holding recently used data near the CPU
RISC	Architecture with few, simple, fast instructions
CISC	Architecture with many complex instructions
Multi-threading	Rapid switching between threads on a single core
Cache hit	Required data found in cache, no RAM access needed

B2 · Microarchitecture ​

The instruction cycle (Fetch-Decode-Execute) ​

Execution speed ​

Factors affecting execution speed ​

Methods of increasing execution speed ​

Instruction sets ​

Pipelining ​

Cache ​

Multi-processing and multi-threading ​

Multi-processing ​

Multi-threading ​

CPU architecture types ​

Embedded and mobile CPU architecture ​

Microcomputer CPU architecture ​

Server CPU architecture ​

Summary ​

Test Yourself ​