Is iso-8859-1 a proper subset of utf-8?
The character reportoire of ISO-8859-1 (the first 256 characters of Unicode) is a proper subset of that of UTF-8 (every Unicode character).
However, the characters U+0080 to U+00FF are encoded differently in the two encodings.
- ISO-8859-1 assigns each of these characters a single byte from
80
toFF
. - UTF-8 encodes the same characters as two-byte sequences
C2 80
toC3 BF
.
What about iso-8859-n?
These are 15 different encodings that contain a total of 614 distinct characters. Some of these characters occur in multiple "parts" of ISO 8859, and some don't. You'll have to be more specific.
I see that your question is tagged ISO-8859-2. The characters that are in -2 that aren't in -1 are:
Ă㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝
What about windows-1252?
Windows-1252 is just like ISO-8859-1 except that it replaces the rarely used control characters in the 0x80-0x9F range with printable characters. The characters that are in windows-1252 but not in ISO-8859-1 are:
ŒœŠšŸŽžƒˆ˜–—‘’‚“”„†‡•…‰‹›€™
No comments:
Post a Comment