When bruteforcing a password (e.g. the common attacks on DES), where you have ci
ID: 651463 • Letter: W
Question
When bruteforcing a password (e.g. the common attacks on DES), where you have ciphertext only, you need a way to assess whether a decrypted plaintext is the right one. I believe the EFF DES machine does this by checking if the chars are printable. Of course, this only works for ASCII files, not things like images.
I'd like to measure the entropy (observed 0th order byte level) and see if it's above a threshold that can be attributed to randomness.
1.Is that a good method?
2.Is there another well known method?
3.Where can I find tables that indicate given a randomly generated message of size X, there's p probability that it's entropy would be < ? (These types of tables are used, for instance, in chi-squared testing)
Explanation / Answer
Well, from your previous questions, I'm assuming that your writing a utility to brute-force decrypt a password protected file (encrypted with a certain encryption utility), and you're looking for a way to determine whether your trial decryption is plausible.
Normally, when an attacker attempts to decrypt something, he has some idea about what it is (why else is he investing the effort), and even if that guess doesn't give him an entire plaintext block, he partial information about what it is (be it an IP packet or a Word document) will help him recognize it.
Thomas states that the usual assumption is that you do have a full plaintext block; that may be the assumption you make when you're designing a cipher (actually, you assume the attacker has a lot more than that), but that's not always true for all attackers (for example, yourself).
Your suggestion (run a Chi-Squared test on the byte frequencies to see if it is consistent with the bytes being generated randomly (with the idea that decrypting the data with the wrong key will end up with random looking plaintext) isn't a bad idea; whether it is the right idea depends a great deal on whether it will actually pick up the plaintext (without the sensitivity needing to be so high that you are deluged with false hits). This is a real concern; I suspect that (say) ZIP files have fairly even byte distributions; the Chi-Squared test would be hard pressed to identify those).
Hence, my suggestion would be to rely on a series of tests, each designed to pick up a particular file format (as well as a general purpose test that has a shot at recognizing file formats you didn't anticipate).
Some ideas for these tests might be:
Checking if the MSbits of the bytes are mostly the same. This will catch anything in a text format (and yes, a Chi-Squared test would also catch it, but you'd need to decrypt a lot more data to run that test)
Check to see if the start of the is consistent with a specific file format (for example, check the magic number that appears in a ZIP file header or a JPEG file header).
Your Chi-Squared based test is a decent general purpose test, which has a good shot at picking up most file formats that aren't heavily compresssed. If I were doing this, I'd pick a simpler but related characteristic (such as "is there a byte value that occurs more than N times"); I have no good feel as to which would work better in practice.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.