Lesson 1

What Is URL Encoding?

Reserved characters, ASCII safety, and why URLs need encoding.

URLs are designed as compact text addresses. To keep parsing predictable, specifications define reserved characters (like ?, #, /, &) that have structural meaning in a URI. Anything that could collide with that structure—or that cannot be reliably transmitted as plain text—must be represented differently. Percent-encoding is the usual answer: sequences like %20 stand in for raw bytes inside the URL character stream.

Modern practice follows RFC 3986 for generic URI syntax and related standards for schemes such as https. The encoding step is often called “URL encoding,” but technically you are percent-encoding octets, not inventing a new alphabet.

Reserved vs unreserved characters

Unreserved characters are safe to appear literally when they belong in that component:

A–Z  a–z  0–9  -  _  .  ~

Reserved characters have syntax roles. Examples:

: / ? # [ ] @ ! $ & ' ( ) * + , ; =

Whether a reserved character must be encoded depends on which component you are editing (scheme, authority, path, query, fragment). As a learner, remember the guiding idea: if a character could be read as punctuation instead of data, encode it when it carries data meaning.

Why non-ASCII text becomes percent bytes

URLs are historically US-ASCII biased. Characters outside ASCII (café, 北京, emojis) are represented as UTF-8 bytes; each problematic byte appears as %HH.

space   → often encoded as %20
é (U+00E9 in UTF-8) → %C3%A9

Decoders walk the % pairs to recover bytes, then interpret those bytes (typically as UTF-8) to rebuild Unicode text.

“Encoding” is not encryption

Percent-encoding changes representation, not meaning in a secret sense. Anyone can decode it. Treat encoding as transport hygiene—like escaping in HTML—not as confidentiality.

https://example.com/search?q=café
⇒ café’s UTF-8 bytes may appear as caf%C3%A9 depending on tooling

Common situations that need encoding

Spaces in filenames or search terms pasted into URLs
Ampersands intended as literal data (Tom & Jerry) inside query parameter values
Slashes or question marks when they belong to opaque identifiers, not separators
Internationalised hostnames handled via punycode (IDNA)—a related but separate mechanism from % escapes in paths/queries.

Understanding when structure ends and opaque data begins is the core skill. The following lessons tighten the % rules and typical library behaviour.

← Back to course overview