Unicode Transformation Formats UTF-8 and UTF-16 are the two most widely used character encodings. UTF-8 encodes each Unicode code point using 1 to 4 bytes; UTF-16 uses 2 or 4 bytes. Both can represent every character in the Unicode standard, covering over 140,000 characters from all the world's writing systems.
JavaScript strings are internally stored as UTF-16. The length property counts UTF-16 code units, not visual characters. Emoji and supplementary characters (code points above U+FFFF) use two UTF-16 code units (a surrogate pair), which is why some emoji have a .length of 2 in JavaScript.
UTF-8 is backward-compatible with ASCII (the first 128 characters are identical), is self-synchronising, and is efficient for languages using the Latin script. It became the dominant web encoding in 2008 and now accounts for over 97% of web pages.
A Byte Order Mark (BOM) is an optional Unicode character (U+FEFF) at the start of a file that indicates the encoding and byte order. UTF-8 files rarely need a BOM; UTF-16 files should always include one (FF FE for little-endian or FE FF for big-endian).