BOM
Byte Order Mark
A special Unicode character at the start of a file indicating its encoding and byte order.
Détail technique
BOM relates to the Unicode standard, which assigns a unique code point (U+0000 to U+10FFFF) to every character across all writing systems. UTF-8 encoding uses 1-4 bytes per character — ASCII characters take 1 byte while CJK ideographs take 3 bytes. UTF-16 uses 2 or 4 bytes and is the internal string format in JavaScript and Java. Proper encoding declaration prevents mojibake (garbled text) when files cross system boundaries.
Exemple
```javascript
// UTF-8 encode/decode
const encoder = new TextEncoder();
const decoder = new TextDecoder('utf-8');
const bytes = encoder.encode('Hello 世界');
// → Uint8Array [72, 101, ..., 228, 184, 150, 231, 149, 140]
decoder.decode(bytes); // 'Hello 世界'
```
Outils associés
Termes associés
Plain Text
Rich Text
Line Ending
Word Count
Case Conversion
Slug
Whitespace
String Interpolation
Escape Character
Unicode
ASCII
Lorem Ipsum
Truncation
Stemming
Tokenization
N-gram
Readability Score
String Distance
Text Encoding
Diacritics
Ligature
Kerning
Leading
CJK
RTL
Normalization (Text)
Grep
Transliteration
ROT13
Text Diff