Character encoding

contributed by Mushtaq Ahmad
A blind person cannot read text with his eyes but if he is trained to read braille, he can read with his fingers. In braille, letters and symbols are represented with a predefined set of dots raised on paper. Since they are raised, a person can feel them. In technical terms, each character is encoded with the braille character encoding system. Similarly, another character encoding system, morse code, was invented to allow transmission of messages over the telegraph. Each symbol was represented with a series of long and short presses of a telegraph key.

A computer does not understand sentences, words, or alphabets. It only understand ones and zeros. To enable a computer to work with characters and other symbols, each symbol has to be encoded into a sequence of ones and zeros. There are numerous different character encoding systems but the two most important ones are American Standard Code for Information Interchange (ASCII) and unicode. ASCII represents 255 different symbols. This was sufficient in during the infancy when computers were not mainstream but soon it became clear that ASCII is not sufficient. For example, it cannot represent more than 40,000 chinese characters. To address this problem, the international community came up with unicode. Being 16-bit, it can encode over 65,000 symbols.