UTF-8 is a variable-width character encoding that can represent all Unicode code points. ASCII characters use 1 byte; other characters use 2–4 bytes. It is the dominant encoding on the web.

What is the difference between UTF-8 and UTF-16?

UTF-8 uses 1–4 bytes per character and is efficient for ASCII-heavy text. UTF-16 uses 2 or 4 bytes per character and is efficient for text heavy with non-Latin characters. JavaScript strings are internally UTF-16.

UTF-8 / UTF-16 Encoder & Decoder

Unicode Transformation Formats UTF-8 and UTF-16 are the two most widely used character encodings. UTF-8 encodes each Unicode code point using 1 to 4 bytes; UTF-16 uses 2 or 4 bytes. Both can represent every character in the Unicode standard, covering over 140,000 characters from all the world's writing systems.

UTF-8 Encoding Rules

Code points 0–127 (ASCII): 1 byte (0xxxxxxx)
Code points 128–2047: 2 bytes (110xxxxx 10xxxxxx)
Code points 2048–65535: 3 bytes (1110xxxx 10xxxxxx 10xxxxxx)
Code points 65536–1114111: 4 bytes (11110xxx 10xxxxxx 10xxxxxx 10xxxxxx)

UTF-16 and JavaScript

JavaScript strings are internally stored as UTF-16. The length property counts UTF-16 code units, not visual characters. Emoji and supplementary characters (code points above U+FFFF) use two UTF-16 code units (a surrogate pair), which is why some emoji have a .length of 2 in JavaScript.

Frequently Asked Questions

Why is UTF-8 the most popular encoding?

UTF-8 is backward-compatible with ASCII (the first 128 characters are identical), is self-synchronising, and is efficient for languages using the Latin script. It became the dominant web encoding in 2008 and now accounts for over 97% of web pages.

What is a BOM?

A Byte Order Mark (BOM) is an optional Unicode character (U+FEFF) at the start of a file that indicates the encoding and byte order. UTF-8 files rarely need a BOM; UTF-16 files should always include one (FF FE for little-endian or FE FF for big-endian).

UTF-8 / UTF-16 Encoder & Decoder online

View the UTF-8 and UTF-16 byte representation of any text — runs in your browser

UTF-8 Encoding Rules

UTF-16 and JavaScript

Frequently Asked Questions

Why is UTF-8 the most popular encoding?

What is a BOM?