BOM — Glossaire — Peasy Audio

A special Unicode character at the start of a file indicating its encoding and byte order.

Détail technique

BOM relates to the Unicode standard, which assigns a unique code point (U+0000 to U+10FFFF) to every character across all writing systems. UTF-8 encoding uses 1-4 bytes per character — ASCII characters take 1 byte while CJK ideographs take 3 bytes. UTF-16 uses 2 or 4 bytes and is the internal string format in JavaScript and Java. Proper encoding declaration prevents mojibake (garbled text) when files cross system boundaries.

Exemple

```javascript
// UTF-8 encode/decode
const encoder = new TextEncoder();
const decoder = new TextDecoder('utf-8');

const bytes = encoder.encode('Hello 世界');
// → Uint8Array [72, 101, ..., 228, 184, 150, 231, 149, 140]

decoder.decode(bytes);  // 'Hello 世界'
```

Outils associés

W Word Counter C Case Converter S Sort Lines L Lorem Ipsum Generator S Slug Generator F Find & Replace R Remove Duplicate Lines B Base64 Encoder/Decoder U URL Encoder/Decoder J JSON Formatter H HTML Entity Encoder/Decoder R Reverse Text A Add/Remove Line Numbers T Text Diff T Text Extractor

Termes associés

Plain Text Rich Text Line Ending Word Count Case Conversion Slug Whitespace String Interpolation Escape Character Unicode ASCII Lorem Ipsum Truncation Stemming Tokenization N-gram Readability Score String Distance Text Encoding Diacritics Ligature Kerning Leading CJK RTL Normalization (Text) Grep Transliteration ROT13 Text Diff