Skip to content

B2 · Microarchitecture

Spec reference: Section B2 - Computer Architecture Key idea: Understand how a CPU processes instructions through fetch-decode-execute cycles, and the techniques used to increase performance.


The instruction cycle (Fetch-Decode-Execute)

The CPU processes instructions in a continuous cycle:

  1. Fetch: The Control Unit fetches the next instruction from memory at the address stored in the Program Counter (PC). The instruction is copied to the Instruction Register (IR). The PC is incremented to point to the next instruction.
  2. Decode: The Control Unit decodes the instruction to determine what operation is required and what data is needed.
  3. Execute: The ALU or other components carry out the instruction (e.g. add two numbers, write to memory, jump to a new address).

Execution speed

Factors affecting execution speed

FactorEffect
Clock speed (GHz)More cycles per second means more instructions per second
Number of coresMore cores allow parallel processing of multiple threads
Cache sizeLarger cache reduces fetching from slower RAM
Instruction setRISC instructions execute faster; CISC can do more per instruction
Bus widthWider buses transfer more data per cycle
RAM speedFaster RAM reduces wait time when the CPU requests data

Methods of increasing execution speed

  • Pipelining: Overlapping stages of multiple instructions.
  • Cache: Storing frequently used data closer to the CPU.
  • Multi-core processors: Running multiple instruction streams simultaneously.
  • Overclocking: Running the CPU above its rated clock speed (at the risk of instability and heat).

Instruction sets

An instruction set is the complete set of instructions a CPU can execute. Two main approaches:

FeatureCISC (Complex Instruction Set Computer)RISC (Reduced Instruction Set Computer)
InstructionsMany complex instructionsFew simple instructions
ExecutionMay take multiple clock cyclesUsually one clock cycle per instruction
Code sizeSmaller programs (fewer instructions needed)Larger programs (more instructions needed)
HardwareComplex, more transistorsSimpler, fewer transistors
ExamplesIntel x86 (PCs)ARM (phones, tablets, Apple Silicon)

Pipelining

Pipelining allows the CPU to work on multiple instructions at the same time by overlapping the stages of the fetch-decode-execute cycle.

Without pipelining:

Instruction 1: Fetch → Decode → Execute
Instruction 2:                           Fetch → Decode → Execute

With pipelining:

Instruction 1: Fetch → Decode → Execute
Instruction 2:         Fetch  → Decode → Execute
Instruction 3:                   Fetch → Decode → Execute

Each stage of a different instruction runs at the same time. This significantly increases throughput.

Hazards that can disrupt pipelining:

  • Data hazard: An instruction needs the result of a previous one that has not yet completed.
  • Control hazard: A branch instruction changes the program flow, making prefetched instructions wrong.

Cache

Cache is ultra-fast memory built into or very close to the CPU. It stores copies of frequently accessed data and instructions so the CPU does not have to wait for slower RAM.

LevelSpeedSizeLocation
L1FastestSmallest (32-64 KB)Inside the CPU core
L2FastMedium (256 KB to a few MB)Inside or close to CPU core
L3Slower than L1/L2Largest (4-32 MB)Shared across all cores

When the CPU needs data, it checks L1 first, then L2, then L3, then RAM. A cache hit means the data was found in cache. A cache miss means it had to be fetched from RAM.


Multi-processing and multi-threading

Multi-processing

Using two or more CPUs (or cores) to execute multiple processes simultaneously. Each core handles a separate process.

Benefit: True parallel execution. Multiple tasks complete faster.

Multi-threading

A single core (or process) switches rapidly between multiple threads of execution. The OS schedules which thread runs at each moment.

Benefit: Better use of the CPU when threads are waiting (e.g. waiting for input/output).

FeatureMulti-processingMulti-threading
Hardware usedMultiple cores/CPUsSingle or multiple cores
ExecutionTrue parallelismRapid switching (concurrency)
MemorySeparate memory per processShared memory between threads
Use caseRunning multiple independent applicationsWeb server handling many requests

CPU architecture types

Embedded and mobile CPU architecture

  • Designed for low power consumption and small physical size.
  • ARM architecture (RISC-based) is dominant in phones and tablets.
  • May have limited cooling, so thermal management is critical.
  • Battery life is prioritised over raw performance.

Microcomputer CPU architecture

  • Balanced between performance and cost.
  • Used in desktop PCs and laptops.
  • Intel x86 and AMD64 are the dominant architectures.
  • Designed for general-purpose tasks.

Server CPU architecture

  • Designed for sustained heavy workloads.
  • More cores (e.g. 32-128) to handle many simultaneous requests.
  • Supports large amounts of ECC (Error Correcting Code) RAM.
  • Optimised for reliability and uptime, not just peak speed.
  • Examples: Intel Xeon, AMD EPYC.

Summary

TermMeaning
Fetch-Decode-ExecuteThe cycle by which a CPU processes instructions
PipeliningOverlapping instruction stages to increase throughput
CacheFast memory holding recently used data near the CPU
RISCArchitecture with few, simple, fast instructions
CISCArchitecture with many complex instructions
Multi-threadingRapid switching between threads on a single core
Cache hitRequired data found in cache, no RAM access needed

Test Yourself

Ad

PassMaven - revision made simple.