contributed by Mushtaq Ahmad
A blind person cannot read text with his eyes but if he is trained to read braille, he can read with his fingers. In braille, letters and symbols are represented with a predefined set of dots raised on paper. Since they are raised, a person can feel them. In technical terms, each character is encoded with the braille character encoding system. Similarly, another character encoding system, morse code, was invented to allow transmission of messages over the telegraph. Each symbol was represented with a series of long and short presses of a telegraph key.
A computer does not understand sentences, words, or alphabets. It only understand ones and zeros. To enable a computer to work with characters and other symbols, each symbol has to be encoded into a sequence of ones and zeros. There are numerous different character encoding systems but the two most important ones are American Standard Code for Information Interchange (ASCII) and unicode. ASCII represents 255 different symbols. This was sufficient in during the infancy when computers were not mainstream but soon it became clear that ASCII is not sufficient. For example, it cannot represent more than 40,000 chinese characters. To address this problem, the international community came up with unicode. Being 16-bit, it can encode over 65,000 symbols.
contributed by Mushtaq Ahmad
American Standard Code for Information Interchange (ASCII) is a character encoding system in computers and other electronic systems using text. With the introduction of unicode, the use of ASCII has been declining over the years. Today, unicode is the most widely used character encoding system.
Getting ASCII value of a character
In C language
#includemain () { char a = 'a'; printf ("%d", a); }
In C++
cout << (int)'a';
In Java
System.out.println((int)'a');
JavaScript
alert("a".charCodeAt(0));
Perl
print ord "a";
PHP
print ord('a');
Python
print ord('a')
Extended ASCII Table
| Dec | Hex | Unicode | Binary | HTML-char | Char | Description |
|---|---|---|---|---|---|---|
| 0 | 0 | U+0000 | 00000000 | NUL | Null char | |
| 1 | 1 | U+0001 | 00000001 | SOH | Start Of Heading | |
| 2 | 2 | U+0002 | 00000010 | STX | Start Of Text | |
| 3 | 3 | U+0003 | 00000011 | ETX | End Of Text | |
| 4 | 4 | U+0004 | 00000100 | EOT | End Of Transmission | |
| 5 | 5 | U+0005 | 00000101 | ENQ | Enquiry | |
| 6 | 6 | U+0006 | 00000110 | ACK | Acknowledgement | |
| 7 | 7 | U+0007 | 00000111 | BEL | Bell | |
| 8 | 8 | U+0008 | 00001000 | BS | Backspace | |
| 9 | 9 | U+0009 | 00001001 | HT | Horizontal Tab | |
| 10 | A | U+000a | 00001010 | LF | Line Feed | |
| 11 | B | U+000b | 00001011 | VT | Vertical Tab | |
| 12 | C | U+000c | 00001100 | FF | Form Feed | |
| 13 | D | U+000d | 00001101 | CR | Carriage Return | |
| 14 | E | U+000e | 00001110 | SO | Shift Out / X-On | |
| 15 | F | U+000f | 00001111 | SI | Shift In / X-Off | |
| 16 | 10 | U+0010 | 00010000 | DLE | Data Line Escape | |
| 17 | 11 | U+0011 | 00010001 | DC1 | Device Control 1 | |
| 18 | 12 | U+0012 | 00010010 | DC2 | Device Control 2 | |
| 19 | 13 | U+0013 | 00010011 | DC3 | Device Control 3 | |
| 20 | 14 | U+0014 | 00010100 | DC4 | Device Control 4 | |
| 21 | 15 | U+0015 | 00010101 | NAK | Negative Acknowledgement | |
| 22 | 16 | U+0016 | 00010110 | SYN | Syncronous Idle | |
| 23 | 17 | U+0017 | 00010111 | ETB | End of Transmission Block | |
| 24 | 18 | U+0018 | 00011000 | CAN | Cancel | |
| 25 | 19 | U+0019 | 00011001 | EM | End Of Medium | |
| 26 | 1A | U+001a | 00011010 | SUB | Substitute | |
| 27 | 1B | U+001b | 00011011 | ESC | Escape | |
| 28 | 1C | U+001c | 00011100 | FS | File Separator | |
| 29 | 1D | U+001d | 00011101 | GS | Group Separator | |
| 30 | 1E | U+001e | 00011110 | RS | Record Separator | |
| 31 | 1F | U+001f | 00011111 | US | Unit Separator | |
| 32 | 20 | U+0020 | 00100000 | Space | ||
| 33 | 21 | U+0021 | 00100001 | ! | ||
| 34 | 22 | U+0022 | 00100010 | " | " | |
| 35 | 23 | U+0023 | 00100011 | # | ||
| 36 | 24 | U+0024 | 00100100 | $ | ||
| 37 | 25 | U+0025 | 00100101 | % | ||
| 38 | 26 | U+0026 | 00100110 | & | & | |
| 39 | 27 | U+0027 | 00100111 | ' | ||
| 40 | 28 | U+0028 | 00101000 | ( | ||
| 41 | 29 | U+0029 | 00101001 | ) | ||
| 42 | 2A | U+002a | 00101010 | * | ||
| 43 | 2B | U+002b | 00101011 | + | ||
| 44 | 2C | U+002c | 00101100 | , | ||
| 45 | 2D | U+002d | 00101101 | - | ||
| 46 | 2E | U+002e | 00101110 | . | ||
| 47 | 2F | U+002f | 00101111 | / | ||
| 48 | 30 | U+0030 | 00110000 | 0 | ||
| 49 | 31 | U+0031 | 00110001 | 1 | ||
| 50 | 32 | U+0032 | 00110010 | 2 | ||
| 51 | 33 | U+0033 | 00110011 | 3 | ||
| 52 | 34 | U+0034 | 00110100 | 4 | ||
| 53 | 35 | U+0035 | 00110101 | 5 | ||
| 54 | 36 | U+0036 | 00110110 | 6 | ||
| 55 | 37 | U+0037 | 00110111 | 7 | ||
| 56 | 38 | U+0038 | 00111000 | 8 | ||
| 57 | 39 | U+0039 | 00111001 | 9 | ||
| 58 | 3A | U+003a | 00111010 | : | ||
| 59 | 3B | U+003b | 00111011 | ; | ||
| 60 | 3C | U+003c | 00111100 | < | < | |
| 61 | 3D | U+003d | 00111101 | = | ||
| 62 | 3E | U+003e | 00111110 | > | > | |
| 63 | 3F | U+003f | 00111111 | ? | ||
| 64 | 40 | U+0040 | 01000000 | @ | ||
| 65 | 41 | U+0041 | 01000001 | A | ||
| 66 | 42 | U+0042 | 01000010 | B | ||
| 67 | 43 | U+0043 | 01000011 | C | ||
| 68 | 44 | U+0044 | 01000100 | D | ||
| 69 | 45 | U+0045 | 01000101 | E | ||
| 70 | 46 | U+0046 | 01000110 | F | ||
| 71 | 47 | U+0047 | 01000111 | G | ||
| 72 | 48 | U+0048 | 01001000 | H | ||
| 73 | 49 | U+0049 | 01001001 | I | ||
| 74 | 4A | U+004a | 01001010 | J | ||
| 75 | 4B | U+004b | 01001011 | K | ||
| 76 | 4C | U+004c | 01001100 | L | ||
| 77 | 4D | U+004d | 01001101 | M | ||
| 78 | 4E | U+004e | 01001110 | N | ||
| 79 | 4F | U+004f | 01001111 | O | ||
| 80 | 50 | U+0050 | 01010000 | P | ||
| 81 | 51 | U+0051 | 01010001 | Q | ||
| 82 | 52 | U+0052 | 01010010 | R | ||
| 83 | 53 | U+0053 | 01010011 | S | ||
| 84 | 54 | U+0054 | 01010100 | T | ||
| 85 | 55 | U+0055 | 01010101 | U | ||
| 86 | 56 | U+0056 | 01010110 | V | ||
| 87 | 57 | U+0057 | 01010111 | W | ||
| 88 | 58 | U+0058 | 01011000 | X | ||
| 89 | 59 | U+0059 | 01011001 | Y | ||
| 90 | 5A | U+005a | 01011010 | Z | ||
| 91 | 5B | U+005b | 01011011 | [ | ||
| 92 | 5C | U+005c | 01011100 | \ | ||
| 93 | 5D | U+005d | 01011101 | ] | ||
| 94 | 5E | U+005e | 01011110 | ^ | ||
| 95 | 5F | U+005f | 01011111 | _ | ||
| 96 | 60 | U+0060 | 01100000 | ` | ||
| 97 | 61 | U+0061 | 01100001 | a | ||
| 98 | 62 | U+0062 | 01100010 | b | ||
| 99 | 63 | U+0063 | 01100011 | c | ||
| 100 | 64 | U+0064 | 01100100 | d | ||
| 101 | 65 | U+0065 | 01100101 | e | ||
| 102 | 66 | U+0066 | 01100110 | f | ||
| 103 | 67 | U+0067 | 01100111 | g | ||
| 104 | 68 | U+0068 | 01101000 | h | ||
| 105 | 69 | U+0069 | 01101001 | i | ||
| 106 | 6A | U+006a | 01101010 | j | ||
| 107 | 6B | U+006b | 01101011 | k | ||
| 108 | 6C | U+006c | 01101100 | l | ||
| 109 | 6D | U+006d | 01101101 | m | ||
| 110 | 6E | U+006e | 01101110 | n | ||
| 111 | 6F | U+006f | 01101111 | o | ||
| 112 | 70 | U+0070 | 01110000 | p | ||
| 113 | 71 | U+0071 | 01110001 | q | ||
| 114 | 72 | U+0072 | 01110010 | r | ||
| 115 | 73 | U+0073 | 01110011 | s | ||
| 116 | 74 | U+0074 | 01110100 | t | ||
| 117 | 75 | U+0075 | 01110101 | u | ||
| 118 | 76 | U+0076 | 01110110 | v | ||
| 119 | 77 | U+0077 | 01110111 | w | ||
| 120 | 78 | U+0078 | 01111000 | x | ||
| 121 | 79 | U+0079 | 01111001 | y | ||
| 122 | 7A | U+007a | 01111010 | z | ||
| 123 | 7B | U+007b | 01111011 | { | ||
| 124 | 7C | U+007c | 01111100 | | | ||
| 125 | 7D | U+007d | 01111101 | } | ||
| 126 | 7E | U+007e | 01111110 | ~ | ||
| 127 | 7F | U+007f | 01111111 | Delete | ||
| 128 | 80 | U+0080 | 10000000 | € | Euro sign | |
| 129 | 81 | U+0081 | 10000001 | | ||
| 130 | 82 | U+0082 | 10000010 | ‚ | Low quotation mark | |
| 131 | 83 | U+0083 | 10000011 | ƒ | Latin small 'f' with a hook | |
| 132 | 84 | U+0084 | 10000100 | „ | Double low quotation mark | |
| 133 | 85 | U+0085 | 10000101 | … | Horizontal ellipsis | |
| 134 | 86 | U+0086 | 10000110 | † | Dagger | |
| 135 | 87 | U+0087 | 10000111 | ‡ | Double dagger | |
| 136 | 88 | U+0088 | 10001000 | ˆ | Modifier letter circumflex accent | |
| 137 | 89 | U+0089 | 10001001 | ‰ | Per mille sign | |
| 138 | 8A | U+008a | 10001010 | Š | S with caron | |
| 139 | 8B | U+008b | 10001011 | ‹ | Left-pointing angle quotation | |
| 140 | 8C | U+008c | 10001100 | Œ | Capital ligature OE | |
| 141 | 8D | U+008d | 10001101 | | ||
| 142 | 8E | U+008e | 10001110 | Ž | Z with caron | |
| 143 | 8F | U+008f | 10001111 | | ||
| 144 | 90 | U+0090 | 10010000 | | ||
| 145 | 91 | U+0091 | 10010001 | ‘ | Left quotation mark | |
| 146 | 92 | U+0092 | 10010010 | ’ | Right quotation mark | |
| 147 | 93 | U+0093 | 10010011 | “ | Left double quotation mark | |
| 148 | 94 | U+0094 | 10010100 | ” | Right double quotation mark | |
| 149 | 95 | U+0095 | 10010101 | • | Bullet | |
| 150 | 96 | U+0096 | 10010110 | – | En dash | |
| 151 | 97 | U+0097 | 10010111 | — | Em dash | |
| 152 | 98 | U+0098 | 10011000 | ˜ | Small tilde | |
| 153 | 99 | U+0099 | 10011001 | ™ | Trade mark sign | |
| 154 | 9A | U+009a | 10011010 | š | s with a caron | |
| 155 | 9B | U+009b | 10011011 | › | Right-pointing angle quotation mark | |
| 156 | 9C | U+009c | 10011100 | œ | Small ligature oe | |
| 157 | 9D | U+009d | 10011101 | | ||
| 158 | 9E | U+009e | 10011110 | ž | z with a caron | |
| 159 | 9F | U+009f | 10011111 | ? | ||
| 160 | A0 | U+00a0 | 10100000 |   | Non-breaking space | |
| 161 | A1 | U+00a1 | 10100001 | ¡ | ¡ | Inverted exclamation mark |
| 162 | A2 | U+00a2 | 10100010 | ¢ | ¢ | Cent sign |
| 163 | A3 | U+00a3 | 10100011 | £ | £ | Pound sign |
| 164 | A4 | U+00a4 | 10100100 | ¤ | ¤ | Currency sign |
| 165 | A5 | U+00a5 | 10100101 | ¥ | ¥ | Yen sign |
| 166 | A6 | U+00a6 | 10100110 | ¦ | ¦ | Pipe |
| 167 | A7 | U+00a7 | 10100111 | § | § | Section sign |
| 168 | A8 | U+00a8 | 10101000 | ¨ | ¨ | Spacing diaeresis - umlaut |
| 169 | A9 | U+00a9 | 10101001 | © | © | Copyright sign |
| 170 | AA | U+00aa | 10101010 | ª | ª | Feminine ordinal indicator |
| 171 | AB | U+00ab | 10101011 | « | « | Left double angle quote |
| 172 | AC | U+00ac | 10101100 | ¬ | ¬ | Not sign |
| 173 | AD | U+00ad | 10101101 | ­ | Soft hyphen | |
| 174 | AE | U+00ae | 10101110 | ® | ® | Registered trade mark sign |
| 175 | AF | U+00af | 10101111 | ¯ | ¯ | Spacing macron - overline |
| 176 | B0 | U+00b0 | 10110000 | ° | ° | Degree sign |
| 177 | B1 | U+00b1 | 10110001 | ± | ± | Plus-or-minus sign |
| 178 | B2 | U+00b2 | 10110010 | ² | ² | Superscript two / squared |
| 179 | B3 | U+00b3 | 10110011 | ³ | ³ | Superscript three / cubed |
| 180 | B4 | U+00b4 | 10110100 | ´ | ´ | Acute accent - spacing acute |
| 181 | B5 | U+00b5 | 10110101 | µ | µ | Micro sign |
| 182 | B6 | U+00b6 | 10110110 | ¶ | ¶ | Pilcrow sign - paragraph sign |
| 183 | B7 | U+00b7 | 10110111 | · | · | Middle dot - Georgian comma |
| 184 | B8 | U+00b8 | 10111000 | ¸ | ¸ | Spacing cedilla |
| 185 | B9 | U+00b9 | 10111001 | ¹ | ¹ | Superscript one |
| 186 | BA | U+00ba | 10111010 | º | º | Masculine ordinal indicator |
| 187 | BB | U+00bb | 10111011 | » | » | Right double angle quotes |
| 188 | BC | U+00bc | 10111100 | ¼ | ¼ | Fraction one quarter |
| 189 | BD | U+00bd | 10111101 | ½ | ½ | Fraction one half |
| 190 | BE | U+00be | 10111110 | ¾ | ¾ | Fraction three quarters |
| 191 | BF | U+00bf | 10111111 | ¿ | ¿ | Inverted question mark |
| 192 | C0 | U+00c0 | 11000000 | À | À | |
| 193 | C1 | U+00c1 | 11000001 | Á | Á | |
| 194 | C2 | U+00c2 | 11000010 | Â | Â | |
| 195 | C3 | U+00c3 | 11000011 | Ã | Ã | |
| 196 | C4 | U+00c4 | 11000100 | Ä | Ä | |
| 197 | C5 | U+00c5 | 11000101 | Å | Å | |
| 198 | C6 | U+00c6 | 11000110 | Æ | Æ | |
| 199 | C7 | U+00c7 | 11000111 | Ç | Ç | |
| 200 | C8 | U+00c8 | 11001000 | È | È | |
| 201 | C9 | U+00c9 | 11001001 | É | É | |
| 202 | CA | U+00ca | 11001010 | Ê | Ê | |
| 203 | CB | U+00cb | 11001011 | Ë | Ë | |
| 204 | CC | U+00cc | 11001100 | Ì | Ì | |
| 205 | CD | U+00cd | 11001101 | Í | Í | |
| 206 | CE | U+00ce | 11001110 | Î | Î | |
| 207 | CF | U+00cf | 11001111 | Ï | Ï | |
| 208 | D0 | U+00d0 | 11010000 | Ð | Ð | |
| 209 | D1 | U+00d1 | 11010001 | Ñ | Ñ | |
| 210 | D2 | U+00d2 | 11010010 | Ò | Ò | |
| 211 | D3 | U+00d3 | 11010011 | Ó | Ó | |
| 212 | D4 | U+00d4 | 11010100 | Ô | Ô | |
| 213 | D5 | U+00d5 | 11010101 | Õ | Õ | |
| 214 | D6 | U+00d6 | 11010110 | Ö | Ö | |
| 215 | D7 | U+00d7 | 11010111 | × | × | |
| 216 | D8 | U+00d8 | 11011000 | Ø | Ø | |
| 217 | D9 | U+00d9 | 11011001 | Ù | Ù | |
| 218 | DA | U+00da | 11011010 | Ú | Ú | |
| 219 | DB | U+00db | 11011011 | Û | Û | |
| 220 | DC | U+00dc | 11011100 | Ü | Ü | |
| 221 | DD | U+00dd | 11011101 | Ý | Ý | |
| 222 | DE | U+00de | 11011110 | Þ | Þ | Latin letter THORN |
| 223 | DF | U+00df | 11011111 | ß | ß | Latin letter sharp s - ess-zed |
| 224 | E0 | U+00e0 | 11100000 | à | à | |
| 225 | E1 | U+00e1 | 11100001 | á | á | |
| 226 | E2 | U+00e2 | 11100010 | â | â | |
| 227 | E3 | U+00e3 | 11100011 | ã | ã | |
| 228 | E4 | U+00e4 | 11100100 | ä | ä | |
| 229 | E5 | U+00e5 | 11100101 | å | å | |
| 230 | E6 | U+00e6 | 11100110 | æ | æ | |
| 231 | E7 | U+00e7 | 11100111 | ç | ç | |
| 232 | E8 | U+00e8 | 11101000 | è | è | |
| 233 | E9 | U+00e9 | 11101001 | é | é | |
| 234 | EA | U+00ea | 11101010 | ê | ê | |
| 235 | EB | U+00eb | 11101011 | ë | ë | |
| 236 | EC | U+00ec | 11101100 | ì | ì | |
| 237 | ED | U+00ed | 11101101 | í | í | |
| 238 | EE | U+00ee | 11101110 | î | î | |
| 239 | EF | U+00ef | 11101111 | ï | ï | |
| 240 | F0 | U+00f0 | 11110000 | ð | ð | Latin letter eth |
| 241 | F1 | U+00f1 | 11110001 | ñ | ñ | |
| 242 | F2 | U+00f2 | 11110010 | ò | ò | |
| 243 | F3 | U+00f3 | 11110011 | ó | ó | |
| 244 | F4 | U+00f4 | 11110100 | ô | ô | |
| 245 | F5 | U+00f5 | 11110101 | õ | õ | |
| 246 | F6 | U+00f6 | 11110110 | ö | ö | |
| 247 | F7 | U+00f7 | 11110111 | ÷ | ÷ | |
| 248 | F8 | U+00f8 | 11111000 | ø | ø | |
| 249 | F9 | U+00f9 | 11111001 | ù | ù | |
| 250 | FA | U+00fa | 11111010 | ú | ú | |
| 251 | FB | U+00fb | 11111011 | û | û | |
| 252 | FC | U+00fc | 11111100 | ü | ü | |
| 253 | FD | U+00fd | 11111101 | ý | ý | |
| 254 | FE | U+00fe | 11111110 | þ | þ | Latin small letter THORN |
| 255 | FF | U+00ff | 11111111 | ÿ | ÿ |
contributed by Mushtaq Ahmad
Unicode is international standard for encoding, representation and handling of text symbols of all major languages used around the world. It was developed in conjunction with the Universal Character Set standard (UCS) and has now replaced ASCII as the most widely used character encoding scheme. It covers details such as enumeration properties (uppercase, lowercase, etc.), character properties, rules for collation, rendering, bidirectional display (e.g. Arabic is written right-to-left), normalization and much more.
The Unicode Consortium has developed Unicode Transformation Scheme (UTF) which is a family of standards for encoding Unicode character set into binary. it has developed UTF-8 and UTF-16. UTF-8 is 8-bit encoding so it covers all ASCII characters and symbols used in European languages. UTF-16 is 16 bit so its takes twice the memory to represent the same character but it can also represent a much larger number of characters. It is also incompatible with ASCII.
Today Unicode is supported by all major programming languages and database systems.