Friday, August 22, 2014

Character Set Special Characters

Is iso-8859-1 a proper subset of utf-8?
The character reportoire of ISO-8859-1 (the first 256 characters of Unicode) is a proper subset of that of UTF-8 (every Unicode character).
However, the characters U+0080 to U+00FF are encoded differently in the two encodings.
  • ISO-8859-1 assigns each of these characters a single byte from 80 to FF.
  • UTF-8 encodes the same characters as two-byte sequences C2 80 to C3 BF.
What about iso-8859-n?
These are 15 different encodings that contain a total of 614 distinct characters. Some of these characters occur in multiple "parts" of ISO 8859, and some don't. You'll have to be more specific.
I see that your question is tagged ISO-8859-2. The characters that are in -2 that aren't in -1 are:
Ă㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝
What about windows-1252?
Windows-1252 is just like ISO-8859-1 except that it replaces the rarely used control characters in the 0x80-0x9F range with printable characters. The characters that are in windows-1252 but not in ISO-8859-1 are:
ŒœŠšŸŽžƒˆ˜–—‘’‚“”„†‡•…‰‹›€™
http://stackoverflow.com/questions/10021594/character-set-special-characters

No comments:

Post a Comment