AS Computer Science: Unicode (UTF-8 focus) and Images
Unicode vs Encoding
• Unicode: a list of characters; each has a code point, e.g. ’A’ = U+0041, Euro = U+20AC.
• Encoding: rules to store code points as bytes. We compare UTF-8, UTF-16, UTF-32.
UTF-8 patterns
Range Pattern Bytes
U+0000–007F 0xxxxxxx 1
U+0080–07FF 110xxxxx 10xxxxxx 2
U+0800–FFFF 1110xxxx 10xxxxxx 10xxxxxx 3
U+10000–10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4
Examples (hex bytes)
• ’A’ U+0041 → 41 ’é’ U+00E9 → C3 A9
• ’£’ U+00A3 → C2 A3 C’ U+20AC → E2 82 AC
’=
• ’’ U+4F60 → E4 BD A0 ’’ U+0634 → D8 B4
• ’’ U+1F600 → F0 9F 98 80
Compare encodings
• UTF-8: 1–4 bytes; ASCII stays 1 byte; web standard.
• UTF-16: 2 or 4 bytes; uses surrogate pairs for U+10000+; often BOM (LE/BE).
• UTF-32: always 4 bytes; simplest indexing; large files.
Practice (Unicode)
1. Encode to UTF-8: (a) ’£’ U+00A3; (b) ’’ U+0939; (c) ’’ U+1F642.
2. Decode: (a) C3 A7; (b) E6 97 A5; (c) F0 9F 8E 89.
3. A file stores Cafe (with é U+00E9) in UTF-8. How many bytes?
Images: vector vs bitmap
• Vector: paths, strokes, fills (SVG/PDF). Scales perfectly.
• Bitmap: pixels in a grid. Resolution = width × height, bit depth in bits per pixel.
• Uncompressed size ≈ w × h × bpp/8 bytes.
• Lossless (PNG/GIF) vs Lossy (JPEG). Metadata (EXIF) adds bytes.
1
Practice (Images)
1. Choose a format and justify: (a) school logo; (b) holiday photo; (c) UI icons.
2. Calculate uncompressed size: (a) 1024 × 768 at 24bpp; (b) 3840 × 2160 at 24bpp.
3. A banner 2560 × 720 at 24bpp is saved as JPEG (lossy). Explain why the file on disk is usually
much smaller than the uncompressed size.
Exam reminders
Show units, state assumptions (“ignore compression”), and use correct encoding for size questions
(UTF-8 is variable length).