So, in my C++ code, I take a text file in ordinary English, analyzed the frequency of the letters in the English Alphabet through the file and stored them into a vector. After getting the frequencies, I replaced each of the letters starting with the most frequent of them with the most frequent of the English Alphabet. I used strings like "ETAOINSHRDLUCMFWYPVBGKJQXZ" and "EOTHASINRDLUYMWFGCBPKVJQXZ" to represent the most frequent of the Alphabet, and then I go through the most frequent of the text one by one (sorted Vector by greater than comparison) and replace each of them with the letters in the strings above. Ultimately, the accuracy of such a naive approach is dependent on the size of the file; I wanna see if I can make it more accurate while maintaining this approach. Like, after I run through the text again to substitute in the new letters, I get a new file that has the new (not real) words in them. Due to the accuracy of such an approach as follows
E 326 E
O 288 T
A 271 A
T 257 O
I 243 I
R 235 N
N 208 S
S 205 H
L 140 R
D 129 D
M 112 L
U 110 U
H 107 C
C 103 M
G 92 F
P 91 W
Y 73 Y
W 58 P
B 53 V
F 51 B
K 29 G
V 22 K
X 15 J
J 6 Q
Q 6 X
Z 1 Z
for a text of moderate length, I get a resulting text that has words like
REANSISF FTARH from LEARNING GOALS
REANS YTU A CAHGERR VY LINAS RIWTKAMA from Learn You a Haskell by Miran Lipovaca
Notice how some words were pretty close. Like learn or you or by. Somewhere along those lines I can maybe "bruteforce" my way to replacing those spellings with the actual word.
How then, could I improve the accuracy so its at least 50% close to the original text? I just need ideas for the time being. Whether it be implementing a dictionary to find common letter patterns or using maps as dictionaries in C++, any advice would be appreciated. Thanks.