Lesson 2
How Base64 Encoding Works
Bits, groups of six, and the 64-character alphabet.
Base64 is elegant because it reuses ideas you already know: bits, powers of two, and a lookup table (the alphabet). There is no prose dictionary or compression step—purely deterministic bit reshuffling plus character substitution.
From bytes to a bit stream
A byte is eight bits. Three bytes give 24 bits:
Byte1: ########
Byte2: ########
Byte3: ######## → concatenate into 24 bits in order
Base64 regroups those 24 bits into four sextets (six bits each):
[######][######][######][######]
^ ^ ^ ^
idx 0 idx 1 idx 2 idx 3
Each six-bit value ranges 0–63, which indexes one of 64 symbols. That index count is literally where “base 64” gets its name: you are writing numbers in base 64, using printable ASCII characters instead of digits 0–63.
The standard alphabet (RFC-style)
Traditional Base64 assigns:
0–25 → A–Z
26–51 → a–z
52–61 → 0–9
62 → '+'
63 → '/'
Decoding walks the inverse map: symbol → six bits → reconstruct original eight-bit groups.
Mini example by hand (two bytes → four chars with padding intuition)
Manual trace is rarely needed day-to-day, but it cements intuition. Take ASCII "Hi":
H=0x48(01001000)i=0x69(01101001)
Concatenate 16 bits: 0100100001101001.
Pad on the right with zeros until the length is divisible by six (lesson on padding dives deeper):
010010 | 000110 | 100100 |
18 | 6 | 36 |
The fourth sextet is incomplete until more input arrives; serializers emit padding = placeholders (detailed next lesson). Libraries handle this mechanically—seeing the splits explains why lengths grow and why corruption flips downstream bits wildly.
Why output grows ~33%
Each trio of bytes (24 bits) becomes four encoded characters representing the same entropy spread across printable text. Oversimplified average growth: for n input bytes, expect about ceil(4n/3) characters before padding.
Plus padding = symbols when n mod 3 != 0. That overhead is expected; Base64 traded size for compatibility.
Decoding symmetry
Decode:
- Map each alphabet character → six-bit value (
+//quirks appear only if you picked the matching variant later). - Concatenate bits.
- Regroup into bytes (eight-bit chunks).
If the string length is malformed or bogus symbols appear (e.g., whitespace or wrong alphabet index), decoding fails—or worse, interoperates inconsistently depending on lax parsers.
Key takeaway
Base64 rechunks bits into groups of six and substitutes each chunk with one of sixty-four predetermined printable characters. Once you visualize the regrouping, padding and URL-safe substitutions become straightforward instead of mysterious.