Appearance
C2 · Text and Image Representation
Spec reference: Sections C2 and C3 - How Data is Represented Key idea: Understand how text is encoded using character sets and how images are stored as binary data.
Text representation
Character sets
A character set is a mapping between characters (letters, numbers, symbols) and binary codes. Every character stored on a computer has a unique binary number assigned to it.
ASCII
ASCII (American Standard Code for Information Interchange) uses 7 bits, giving 128 possible characters (0-127).
| Range | Characters |
|---|---|
| 0-31 | Control characters (non-printable, e.g. newline) |
| 32-126 | Printable characters (letters, digits, symbols) |
| 65-90 | Uppercase A-Z |
| 97-122 | Lowercase a-z |
| 48-57 | Digits 0-9 |
Extended ASCII uses 8 bits (256 characters) and adds accented characters and symbols.
Limitation: ASCII only supports English and a few Western European characters. It cannot represent Chinese, Arabic, emoji, or most world languages.
Unicode
Unicode was designed to represent every character in every language.
| Standard | Bits | Characters supported |
|---|---|---|
| UTF-8 | Variable (1-4 bytes) | Over 1.1 million code points |
| UTF-16 | 2 or 4 bytes | Same range as UTF-8 |
| UTF-32 | 4 bytes fixed | Same range |
UTF-8 is backward compatible with ASCII: the first 128 Unicode code points are identical to ASCII. This is why UTF-8 is the dominant encoding on the web.
Implications of Unicode:
- Files are larger than ASCII because characters can use more bytes.
- Universal: one standard supports all world languages, emoji, and special symbols.
- Ensures text displays correctly when shared across different systems and countries.
Image representation
How bitmap/raster images are stored
A bitmap image is a grid of pixels. Each pixel stores a colour value as a binary number.
Key terms:
| Term | Definition |
|---|---|
| Pixel | The smallest individual element in a bitmap image |
| Resolution | The number of pixels in the image (width x height), e.g. 1920x1080 |
| Bit depth (colour depth) | Number of bits used to represent the colour of each pixel |
Bit depth and colour
| Bit depth | Colours available | Notes |
|---|---|---|
| 1 bit | 2 (black and white) | Minimal storage |
| 8 bit | 256 | Greyscale or limited colour |
| 24 bit | 16,777,216 (true colour) | Red, Green, Blue - 8 bits each |
| 32 bit | True colour + transparency | 24-bit RGB plus an 8-bit alpha channel |
Calculating file size
Example: A 100 x 100 pixel image with 24-bit colour depth:
bits = 30,000 bytes = approximately 29.3 KB
Impact of resolution
Higher resolution means more pixels, which means:
- Greater detail and clarity.
- Larger file size.
- More processing power needed to display or edit.
Impact of bit depth
Higher bit depth means more colours per pixel, which means:
- More realistic colour representation.
- Larger file size.
Compression
Because bitmap images can be very large, compression is used to reduce file size.
| Type | Description | Quality | File formats |
|---|---|---|---|
| Lossless | Removes redundant data without losing any quality. The original can be perfectly restored. | No quality loss | PNG, BMP, GIF |
| Lossy | Permanently removes some data to achieve much smaller file sizes. Quality degrades. | Some quality loss | JPEG |
Exam point
Always state that lossy compression cannot be reversed. Once data is discarded, the original cannot be restored. Lossless compression can always be perfectly decompressed.
Summary
| Term | Meaning |
|---|---|
| ASCII | 7-bit character encoding supporting 128 characters |
| Unicode (UTF-8) | Universal variable-width encoding supporting all world languages |
| Pixel | The smallest element of a bitmap image |
| Bit depth | Number of bits used per pixel to represent colour |
| Lossless compression | Reduces file size with no quality loss |
| Lossy compression | Reduces file size by permanently discarding data |