The letter frequency counts (left-most column) are taken from one of the common books on cryptanalysis, based on number of occurrences per thousand of normal English text material. Each character is analyzed ("structure") into units, 1 for minimum signal duration (one dit), 111 (three units duration) for a dah, and each equal unit of silence denoted by 0 (zero). The required three units of silence separating each character is added (000) to each one below.
Freq | Letter | Structure |
Units |
Total |
130 | E | 1000 | 4 | 520 |
92 | T | 111000 | 6 | 552 |
79 | N | 11101000 | 8 | 632 |
76 | R | 1011101000 | 10 | 760 |
75 | O | 11101110111000 | 14 | 1050 |
74 | A | 10111000 | 8 | 592 |
74 | I | 101000 | 6 | 444 |
61 | S | 10101000 | 8 | 488 |
42 | D | 1110101000 | 10 | 420 |
36 | L | 101110101000 | 12 | 432 |
34 | H | 1010101000 | 10 | 340 |
31 | C | 11101011101000 | 14 | 434 |
28 | F | 101011101000 | 12 | 336 |
27 | P | 10111011101000 | 14 | 378 |
26 | U | 1010111000 | 10 | 260 |
25 | M | 1110111000 | 10 | 250 |
19 | Y | 1110101110111000 | 16 | 304 |
16 | G | 111011101000 | 12 | 192 |
16 | W | 101110111000 | 12 | 192 |
15 | V | 101010111000 | 12 | 180 |
10 | B | 111010101000 | 12 | 120 |
5 | X | 11101010111000 | 14 | 70 |
3 | Q | 1110111010111000 | 16 | 48 |
3 | K | 111010111000 | 12 | 36 |
2 | J | 1011101110111000 | 16 | 32 |
1 | Z | 11101110101000 | 14 | 14 |
1000 Ave. Structure length 11.23 Ave. 9.07 9076
From the above, if we take five times the above average letter length and add
the space required for word spacing (seven total
or 0000000) we arrive at the normal English word length as 5 x 9.076 + 4 =
49.38. This is just a bit less than 1% shorter than
50 units per standard word. (By contrast, a random five-letter group
averages 60.15 units. This is 20.3% longer than normal
English word length.)
A similar analysis of numbers will show that the average length of a number is 17 units (minimum 12, maximum 22) or a group of five numbers takes about 1.78 times as long to transmit as a five letter word.
Comparing these calculations will show some of the reasons why receiving speeds vary with the kind of material being sent.
As a matter of interest, we list here the letters from the shortest to the longest by the number of units (less letter space) -- notice that all lengths are odd numbers: 1 - E; 3 - I, T; 5 - A, N, S; 7 - D, H, M, R, U; 9 - B, F, G, K, L, V, W; 11 - C, O, P, X, Z; 13 - J, Q, Y.
If the same kind of calculations are carried out for several foreign languages, the following results are obtained for the average character length: (Frequency data from Secret and Urgent, Fletcher Pratt l942 Tables II to IV, p. 253 ff.) German 8.640, French 8.694, Spanish 8.286. These range on the average from 5 - 9% shorter per character than in English. There seem little doubt that if the code were somewhat redesigned and adjusted to optimize it for English a reduction of about 5% could be made.
For the Original American Morse code:- Mr. Ivan Coggeshall made an analysis of American Morse comparatively, using the same normal dah lengths and word spacing one unit shorter, and arrived at an average letter (frequency) length of 7.978 (as compared with 9.076) and average number length of l4. As noted in Chapter 16, American Morse timing is open to considerable variation