What about the characters in other languages?
Introduction
We know that characters are made up of codes, and all of the possible characters for a particular computer system is known as the character set. There are a number of different character sets that could be used.
ASCII
A common character set is known as ASCII (pronounced 'ass key'). This stands for American Standard Code for Information Interchange. It is made up of all of the characters that we use in the western world and other English speaking countries (abcde, ABCDE, 12345 !"£$ and so on). There are two versions of ASCII. Standard ASCII uses 7 bits per code. Extended ASCII uses 8 bits per code. With 8 bit ASCII, there are a total of 28 (or 256) different combinations of codes e.g.
0000 0000
0000 0001
0000 0010
0000 0011
0000 0100
and so on, all the way up to 255, or 1111 1111 in binary. Note that there are 256 combinations, but the biggest number you can store is 255. This is because programmers usually start counting from zero.
That's fine for English, but what about all of the languages that don't use the latin script (the common letters, numbers and symbols that we use in English). What about Chinese characters, Japanese, Arabic, Russian, Greek, Thai, Runic, Bengali, Tamil, Lao, Khmer, Tibetan, Ethiopian, Cherokee, Mongolian and all of the other languages? What about specialist symbols, for example in Science and Maths and Music? How can we represent these symbols if ASCII only gives us 256 different combinations?
UNICODE
We can represent more characters if we use more bits to represent each character. If we used 16 bits instead of 8 bits then we would have 0000 0000 0000 0000 bits to play with. To work out all of the different possible unique combinations of codes using 16 bits, we do this calculation: 216 and we find that there is about 65000 different code patterns. Compare that to just 256 different patterns for ASCII. Each one of those extra codes could be used for a letter or number or symbol in another language or for specialist purposes such as for Maths symbols. UNICODE uses the same codes for the latin symbols as ASCII but uses the other codes available to it for all those other languages and for specialist purposes. In fact, most computers are using Unicode (which therefore also uses ASCII, because ASCII is a sub-set of UNICODE).
Q1. Apart from ASCII, name one other character set.
Q2. How is this character set different to ASCII?
Q3. If you have 4 bits, 0000, how do you calculate the total number of different possible combinations of 0s and 1s i.e. 0000, 0001, 0010 and so on?
Q4. If you have 6 bits, 00 0000, how do you calculate the total number of different possible combinations of 0s and 1s?
Q5. If you have 12 bits, 0000 0000 0000, how do you calculate the total number of different possible combinations of 0s and 1s?
Q6. If you have n bits, how do you calculate the total number of different possible combinations of 0s and 1s?
Extension work
Try to find a listing of the UNICODE codes for a language like Hindi, Chinese, Japanese or any other non-Latin script language.