Out of Memory: Character Set Special Characters

Friday, August 22, 2014

Character Set Special Characters

Is iso-8859-1 a proper subset of utf-8?

The character reportoire of ISO-8859-1 (the first 256 characters of Unicode) is a proper subset of that of UTF-8 (every Unicode character).

However, the characters U+0080 to U+00FF are encoded differently in the two encodings.

ISO-8859-1 assigns each of these characters a single byte from 80 to FF.
UTF-8 encodes the same characters as two-byte sequences C2 80 to C3 BF.

What about iso-8859-n?

These are 15 different encodings that contain a total of 614 distinct characters. Some of these characters occur in multiple "parts" of ISO 8859, and some don't. You'll have to be more specific.

I see that your question is tagged ISO-8859-2. The characters that are in -2 that aren't in -1 are:

ĂăĄąĆćČčĎďĐđĘęĚěĹĺĽľŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝

What about windows-1252?

Windows-1252 is just like ISO-8859-1 except that it replaces the rarely used control characters in the 0x80-0x9F range with printable characters. The characters that are in windows-1252 but not in ISO-8859-1 are:

ŒœŠšŸŽžƒˆ˜–—‘’‚“”„†‡•…‰‹›€™

http://stackoverflow.com/questions/10021594/character-set-special-characters

Out of Memory

Friday, August 22, 2014

Character Set Special Characters

No comments:

Post a Comment

Popular Posts

My Blog List