ASCII: The American Standard for Information Interchange

In May of 1961, the American Standards Association's X3.2 subcommittee met to discuss standardising a new system for encoding characters in telecommunications and computing devices. The resulting 7-bit standard, first published in 1963, allowed a total of no more than 128 code points to be assigned in an organised manner that would allow devices to communicate with one another unambiguously. The initial published standard was revised several times, however, and some of the changes made in those revisions are as important as the original standard when looking at ASCII's impact on computing in the present day. This page aims to present some key information about the initial standard and its early revisions in order to contextualise them in a way that will hopefully be informative and interesting, even if you know nothing whatsoever about the topic beforehand.

Each of the characters of the ASCII are encoded in a byte containing 7 bits of data, each bit being represented by a binary value of either 0 or 1. (Today, it is generally more common to think of a byte as consisting of multiples of 8 bits.) I don't want to get overly bogged down in how computers encode data in general here, but I will briefly acknowledge the concept of bit indexing only to explain that a byte is written, in effect, from right to left. In other words, the "first" bit, which is known as the least significant bit (LSb), is the one written at the end of the number. Unsurprisingly, the bit written first (the one on the far left) is known as the most significant bit (MSb). For example, in the ASCII character 1001100 (which happens to be the letter L), the bits are numbered as follows:

bit 7
(MSb)
bit 6 bit 5 bit 4 bit 3 bit 2 bit 1
(LSb)
1 0 0 1 1 0 0

(Note: Yes, if you're wondering, the ASCII indexes bits from 1, not 0. If you weren't wondering, don't worry about it.)

The complete array of ASCII code points (each code point representing one character) is defined in a table which helps to visualise how the characters are organised. Remember the bit order when reading the tables, i.e. bits 5 to 7 are the ones on the left, and bits 1 to 6 are the ones on the right.

Not every character is a glyph: several are "control characters" which have special functions that I'll explain later.


American Standard Code for Information Interchange (ASA X3.4-1963)

The original publication of the ASCII defined the following 100 code points, leaving 28 undefined for future standardisation:

bits 5 to 7 →
bits 1 to 4 ↓
000 001 010 011 100 101 110 111
0000 NULL DC0 ƀ 0 @* P
0001 SOM DC1 ! 1 A Q
0010 EOA DC2 " 2 B R
0011 EOM DC3 # 3 C S
0100 EOT DC4
(STOP)
$ 4 D T
0101 WRU ERR % 5 E U
0110 RU SYNC & 6 F V
0111 BELL LEM ' 7 G W
1000 FE0 S0 ( 8 H X
1001 HT/SK S1 ) 9 I Y
1010 LF S2 * :* J Z
1011 VTAB S3 + ;* K [*
1100 FF S4 , < L \* ACK
1101 CR S5 - = M ]*
1110 SO S6 . > N * ESC
1111 SI S7 / ? O * DEL

All blank positions in this table represent codes that are theoretically available for use but not standardised.

* The document also suggests some non-standard functions that could hypothetically be assigned to some code points:

"The five graphics immediately following the letter Z can be replaced by the additional letters required for complete expression of certain European alphabets. Further, the single position preceding the letter A can be used for those alphabets requiring 32 characters. In most cases, only three additional letters will be required.

For those applications requiring use of the sterling monetary system or duodecimal arithmetic, the digits 10 and 11 can replace the two graphics immediately following the digit 9."

Control characters

The ASCII includes a number of “control characters”, which it organises (somewhat loosely) into four categories:

  1. Transmission controls
  2. Format effectors
  3. Device controls
  4. Information separators

However, it doesn't clearly designate which ones are intended to belong to which groups, other than explaining that they are largely chunked together as they appear in the code block. Context more or less fills in the gaps, though.

Abbr. Full name Type (inferred)
SOM Start of message Transmission control
EOA End of address Transmission control
EOM End of message Transmission control
WRU "Who are you?" Transmission control
RU "Are you...?" Transmission control
FE0 Format effector Format effector
HT/SK Horizontal tabulation / skip (punched card) Format effector
LF Line feed Format effector
VTAB Vertical tabulation Format effector
FF Form feed Format effector
CR Carriage return Format effector
SO Shift out Format effector
SI Shift in Format effector
DC0–4 Device control 0–4 Device controls
ERR Error Transmission control
SYNC Synchronous idle Transmission control
LEM Logical end of media Transmission control
S0–7 Separator 0–7 Information separators
ƀ Word separator / space (generally non-printing) Format effector
ACK Acknowledge Transmission control
described as an unassigned control character N/A
ESC Escape Format effector
DEL Delete not strictly a control character at all; rather, represents all positions of a row on a punched card or perforated tape being punched out: 1111111

Coming soon: ASA X3.4-1965 and beyond