Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

So the other day, I was piqued by a ciphertext that was trivially encoded throug

ID: 648415 • Letter: S

Question

So the other day, I was piqued by a ciphertext that was trivially encoded through substitution. Solvable only because we can use our eyes to determine word boundary word detection, etc. Trawling the net, there didn't seem to be any automated way of doing this, other than of course to generate a key the size of the alphabets.

How would one go about programmatically decrypting a substitution cipher given that all spaces have been removed?

It seems like there would be multiple runs of frequency analysis on single letters, then double letters, then finding out word boundaries. How would one go about solving this without resorting to a 26-factorial brute-force method?

Explanation / Answer

I think frequency analysis is the preferred approach for breaking substitution ciphers. The more ciphertext you have, the better it is. My suggestion would be to do the following:

First, you compute two probability distributions:

Once you have these distributions you have a significant knowledge. Instead of trying all possible permutations in brute force, you can now try the most probable ones first. For each permutation, you can check if the probabilities of the symbols in the ciphertext correspond to the probabilities of the symbols in the reference text.

The Kullbach-Leibler (KL) distance is often used to measure the similarity of two probability distributions. You will ideally start your search with the permutation that minimizes the KL distance and then proceed to permutations with a higher KL distance.

For each permutation you then substitute the symbols accordingly and you check if words that frequently appear in your reference text are also found in the decoded text. You keep the best solution which has the most matches, and you continue the search.

The probability to find a better decryption will decrease the longer you search. So you can just stop after some time. You have to decide when exactly the algorithm will stop, but here again you have the probabilites and the KL distance to help you make this decision.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote