← All tools
// Encoding

UTF-8 / UTF-16 Encoder & Decoder online

View the UTF-8 and UTF-16 byte representation of any text — runs in your browser

Chunky Munster mascot
by
CHUNKY
MUNSTER
// UTF-8 Hex Bytes
Output will appear here...
0
Characters
0
UTF-8 bytes
0
UTF-16 bytes

Unicode Transformation Formats UTF-8 and UTF-16 are the two most widely used character encodings. UTF-8 encodes each Unicode code point using 1 to 4 bytes; UTF-16 uses 2 or 4 bytes. Both can represent every character in the Unicode standard, covering over 140,000 characters from all the world's writing systems.

UTF-8 Encoding Rules

UTF-16 and JavaScript

JavaScript strings are internally stored as UTF-16. The length property counts UTF-16 code units, not visual characters. Emoji and supplementary characters (code points above U+FFFF) use two UTF-16 code units (a surrogate pair), which is why some emoji have a .length of 2 in JavaScript.

Frequently Asked Questions

Why is UTF-8 the most popular encoding?

UTF-8 is backward-compatible with ASCII (the first 128 characters are identical), is self-synchronising, and is efficient for languages using the Latin script. It became the dominant web encoding in 2008 and now accounts for over 97% of web pages.

What is a BOM?

A Byte Order Mark (BOM) is an optional Unicode character (U+FEFF) at the start of a file that indicates the encoding and byte order. UTF-8 files rarely need a BOM; UTF-16 files should always include one (FF FE for little-endian or FE FF for big-endian).