Character sets and the number of characters which can be stored
Introduction
We know that characters are made up of codes, and all of the possible characters for a particular computer system is known as the character set. There are a number of different character sets that could be used.
ASCII
A common character set is known as ASCII (pronounced 'ass key'). This stands for American Standard Code for Information Interchange. It is made up of all of the characters that we use in the western world and other English speaking countries (abcde, ABCDE, 12345 !"£$ and so on). There are two versions of ASCII. Standard ASCII uses 7 bits per code. Extended ASCII uses 8 bits per code. With 8 bit ASCII, there are a total of 28 (or 256) different combinations of codes e.g.
0000 0000
0000 0001
0000 0010
0000 0011
0000 0100
and so on, all the way up to 255, or 1111 1111 in binary. Note that there are 256 combinations, but the biggest number you can store is 255. This is because programmers usually start counting from zero.
That's fine for English, but what about all of the languages that don't use the latin script (the common letters, numbers and symbols that we use in English). What about Chinese characters, Japanese, Arabic, Russian, Greek, Thai, Runic, Bengali, Tamil, Lao, Khmer, Tibetan, Ethiopian, Cherokee, Mongolian, Klingon (yes, Klingon has its own writing) and all of the other languages? What about specialist symbols, for example in Maths? How can we represent these symbols if ASCII only gives us 256 different combinations?
Unicode
We can represent more characters if we use more bits to represent each character. If we used 16 bits instead of 8 bits then we would have a total of 216 or about 65000 different code patterns. Each one of those codes could be used for a letter or number or symbol in another language or for specialist purposes such as for Maths symbols. Unicode uses the ASCII codes for the latin script but uses the other codes for other languages and specialist purposes.