mathschallenge.net logo

Frequently Asked Questions

Are there useful strategies in decryption?

This is not a definitive list of the distribution of letters/words in the English language. However, it does represent the findings of one in-depth analysis of a wide variety of texts. Unless stated otherwise, each list will be given in descending order of frequency.


Single letters:
e t o a n i r s h d l c w u m f y g p b v k x q j z
 
Digraphs (pairs of consecutive letters):
th er on an re he in ed nd ha at en es of or nt ea ti to it st io le is ou ar as de rt ve
 
Trigraphs (three consecutive letters):
the and tha ent ion tio for nde has nce edt tis oft sth men
 
Two letter words:
of to in it is be as at so we he by or on do if me my up an go no us am
 
Three letter words:
the and for are but not you all any can had her was one our out day get has him his how man new now old see two way who boy did its let put say she too use
 
Four letter words:
that with have this will your from they know want been good much some time very when come here just like long make many more only over such take than them well were
 
Starting letters:
t o a w b c d s f m r h i y e g l n p u j k
 
Finishing letters:
e s t d n r y f l o g h a k m p u w

All the lists are useful with informal substitution (with spaces left in the cipher text), but only the first three lists can be used with formal substitution (spaces removed).

Interestingly, Samuel Morse (1791-1872), the inventor of the Morse code system, needed to know which letters appeared most frequently. This was done so that the most common letters could be assigned the simplest representation of dots and dashes. By counting the number of letters appearing in different typed documents, he obtained the following results.


12,000

E
9,000 T
8,000 A, I, N, O, S
6,400 H
6,200 R
4,400 D
4,000 L
3,400 U
3,000 C, M
2,500 F
2,000 W, Y
1,700 G, P
1,600 B
1,200 V
800 K
500 Q
400 J, X
200 Z