For comparison SCSU was adopted as standard Unicode compression scheme with a byte/code point ratio similar to language-specific code pages. SCSU has not been widely adopted, as it is not suitable for MIME “text” media types. For example, SCSU cannot be used directly in emails and similar protocols. SCSU requires a complicated encoder design for good performance. Usually, the zip, bzip2, and other industry standard algorithms compact larger amounts of Unicode text more efficiently.
Both SCSU and BOCU-1 are IANA registered charsets.
All numbers in this section are hexadecimal, and all ranges are inclusive.
Code points from
U+0020 are encoded in BOCU-1 as the corresponding byte value. All other code points (that is,
U+10FFFF) are encoded as a difference between the code point and a normalized version of the most recently encoded code point that was not an ASCII space (
U+0020). The initial state is
U+0040. The normalization mapping is as follows:
The difference between the current code point and the normalized previous code point is encoded as follows: