ASCII: The American Standard for Information Interchange
In May of 1961, the American Standards Association's X3.2 subcommittee met to discuss standardising a new system for encoding characters in telecommunications and computing devices. The resulting 7-bit standard, first published in 1963, allowed a total of no more than 128 code points to be assigned in an organised manner that would allow devices to communicate with one another unambiguously. The initial published standard was revised several times, however, and some of the changes made in those revisions are as important as the original standard when looking at ASCII's impact on computing in the present day. This page aims to present some key information about the initial standard and its early revisions in order to contextualise them in a way that will hopefully be informative and interesting, even if you know nothing whatsoever about the topic beforehand.
Each of the characters of the ASCII are encoded in a byte containing 7 bits of data, each bit being represented by a binary value of either 0 or 1. (Today, it is generally more common to think of a byte as consisting of multiples of 8 bits.) I don't want to get overly bogged down in how computers encode data in general here, but I will briefly acknowledge the concept of bit indexing only to explain that a byte is written, in effect, from right to left. In other words, the "first" bit, which is known as the least significant bit (LSb), is the one written at the end of the number. Unsurprisingly, the bit written first (the one on the far left) is known as the most significant bit (MSb). For example, in the ASCII character 1001100 (which happens to be the letter L), the bits are numbered as follows:
bit 7 (MSb) |
bit 6 | bit 5 | bit 4 | bit 3 | bit 2 | bit 1 (LSb) |
---|---|---|---|---|---|---|
1 | 0 | 0 | 1 | 1 | 0 | 0 |
(Note: Yes, if you're wondering, the ASCII indexes bits from 1, not 0. If you weren't wondering, don't worry about it.)
The complete array of ASCII code points (each code point representing one character) is defined in a table which helps to visualise how the characters are organised. Remember the bit order when reading the tables, i.e. bits 5 to 7 are the ones on the left, and bits 1 to 6 are the ones on the right.
Not every character is a glyph: several are "control characters" which have special functions that I'll explain later.
American Standard Code for Information Interchange (ASA X3.4-1963)
The original publication of the ASCII defined the following 100 code points, leaving 28 undefined for future standardisation:
bits 5 to 7 → bits 1 to 4 ↓ |
000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 |
---|---|---|---|---|---|---|---|---|
0000 | NULL | DC0 | ƀ | 0 | @* | P | ||
0001 | SOM | DC1 | ! | 1 | A | Q | ||
0010 | EOA | DC2 | " | 2 | B | R | ||
0011 | EOM | DC3 | # | 3 | C | S | ||
0100 | EOT | DC4 (STOP) |
$ | 4 | D | T | ||
0101 | WRU | ERR | % | 5 | E | U | ||
0110 | RU | SYNC | & | 6 | F | V | ||
0111 | BELL | LEM | ' | 7 | G | W | ||
1000 | FE0 | S0 | ( | 8 | H | X | ||
1001 | HT/SK | S1 | ) | 9 | I | Y | ||
1010 | LF | S2 | * | :* | J | Z | ||
1011 | VTAB | S3 | + | ;* | K | [* | ||
1100 | FF | S4 | , | < | L | \* | ACK | |
1101 | CR | S5 | - | = | M | ]* | ① | |
1110 | SO | S6 | . | > | N | ↑* | ESC | |
1111 | SI | S7 | / | ? | O | ←* | DEL |
All blank positions in this table represent codes that are theoretically available for use but not standardised.
* The document also suggests some non-standard
functions that could hypothetically be assigned to some code
points:
For those applications requiring use of the sterling monetary system or duodecimal arithmetic, the digits 10 and 11 can replace the two graphics immediately following the digit 9."
Control characters
The ASCII includes a number of “control characters”, which it organises (somewhat loosely) into four categories:
- Transmission controls
- Format effectors
- Device controls
- Information separators
However, it doesn't clearly designate which ones are intended to belong to which groups, other than explaining that they are largely chunked together as they appear in the code block. Context more or less fills in the gaps, though.
Abbr. | Full name | Type (inferred) |
---|---|---|
SOM | Start of message | Transmission control |
EOA | End of address | Transmission control |
EOM | End of message | Transmission control |
WRU | "Who are you?" | Transmission control |
RU | "Are you...?" | Transmission control |
FE0 | Format effector | Format effector |
HT/SK | Horizontal tabulation / skip (punched card) | Format effector |
LF | Line feed | Format effector |
VTAB | Vertical tabulation | Format effector |
FF | Form feed | Format effector |
CR | Carriage return | Format effector |
SO | Shift out | Format effector |
SI | Shift in | Format effector |
DC0–4 | Device control 0–4 | Device controls |
ERR | Error | Transmission control |
SYNC | Synchronous idle | Transmission control |
LEM | Logical end of media | Transmission control |
S0–7 | Separator 0–7 | Information separators |
ƀ | Word separator / space (generally non-printing) | Format effector |
ACK | Acknowledge | Transmission control |
① | described as an unassigned control character | N/A |
ESC | Escape | Format effector |
DEL | Delete | not strictly a control character at all; rather, represents all positions of a row on a punched card or perforated tape being punched out: 1111111 |