0
votes

I understand this has been asked before and I somewhat have a grasp on how to compare frequency tables between cipher and English(this is the language I'm assuming its in for my program) but I'm unsure about how to get this into code.

void frequencyUpdate(std::vector< std::vector< std::string> > &file, std::vector<int> &freqArg) {
    for (int itr_1 = 0; itr_1 < file.size(); ++itr_1) {

        for (int itr_2 = 0; itr_2 < file.at(itr_1).size(); ++itr_2) {

            for (int itr_3 = 0; itr_3 < file.at(itr_1).at(itr_2).length(); ++itr_3) {
                file.at(itr_1).at(itr_2).at(itr_3) = toupper(file.at(itr_1).at(itr_2).at(itr_3));

                if (!((int)file.at(itr_1).at(itr_2).at(itr_3) < 65 || (int)file.at(itr_1).at(itr_2).at(itr_3) > 90)) {
                    int temp = (int)file.at(itr_1).at(itr_2).at(itr_3) - 65;
                    freqArg.at(temp) += 1;
                }
            }

        }

    }
}

this is how I get the frequency of a given file that has its contents split into lines and then into words, hence the double vector of strings and using ASCII values of the chars - 65 for indices. The resulting vector of ints that hold frequency is saved.

Now is where I don't knot how to proceed. Should I hardcode in a const std:: vector <int> for the English frequency of letters and then somehow to comparison? How would I compare efficiently rather than simply compare each vector to each other for is possible not an efficient method?

This comparison is for getting an appropriate shift value for caesar cipher shifting to decrypt a text. I don't wanna use brute force and shift one at a time until the text is readable. Any advice on how to approach this? Thanks.

3
You will have an element of brute force. That is the nature of your approach. - Captain Giraffe

3 Answers

0
votes

Take your frequency vector and the frequency vector for "typical" English text, and find the cross-correlation.

The highest values of the cross-correlation correspond to the most likely shift values. At that point you'll need to use each one to decrypt, and see whether the output is sensible (i.e. forms real words and coherent sentences).

0
votes

In English, 'e' has the highest frequency. So whatever most frequent letter you got from your ciphertext, it most likely maps to 'e'. Since e --> X then the key should be difference between 'e' and your most frequent letter X.

If this is not the right key (due to too short ciphertext distorting the statistics), try to match your most frequent ciphertext letter with the second one in English i.e. a.

0
votes

I would suggest a graph traversal algorithm. Your starting node has no substitutions assigned and has 26 connected nodes, one for each possible letter substitution for the most frequently occurring ciphertext letter. The next node has another 25 connected nodes for the possible letters for the second most frequent ciphertext letter (one less, since you've already used one possible letter). Which destination node you choose should be based on which letters are most likely given a normal frequency distribution for the target language.

At each node, you can test for success by doing your substitutions into the ciphertext, and finding all the resulting words that now match entries in a dictionary file. The more matches you've found, the more likely you've got the correct substitution key.