Also known as: letter-frequency analysis
Frequency analysis breaks a substitution cipher by exploiting the fact that letters and symbols of a natural language occur with characteristic, uneven frequencies that survive a simple letter-for-letter substitution.1
How it works
In a monoalphabetic substitution cipher each plaintext letter is consistently replaced by one ciphertext symbol. The substitution hides which symbol stands for which letter, but it does not change how often each appears: if the most common ciphertext symbol shows up about as often as the most common letter does in the plaintext language, those two probably match. The analyst tabulates symbol frequencies, aligns them with known language statistics, and confirms guesses using common pairs, doublets, and short words. Because it needs only intercepted ciphertext, it is the archetypal ciphertext-only attack.
Modern ciphers are built specifically to defeat this. Strong diffusion and confusion spread each plaintext symbol’s influence across the whole output, so a fixed S-box used inside many rounds — rather than as a single standalone substitution — flattens the output statistics. A one-time pad defeats frequency analysis completely: its output is uniform by construction.
Relevance to SDR
Frequency analysis does not apply to the encrypted voice GopherTrunk encounters — keyed AES and DES produce statistically flat output by design. Its relevance is to obfuscation, not encryption. When reverse-engineering an unknown, keyless transform clean-room (as in the talker-alias work, issue #773), counting how often each byte value appears is one of the first diagnostics: a non-flat distribution betrays a simple substitution and helps recover a fixed lookup table, whereas a flat distribution suggests a stronger construction. The technique is a tool for understanding unkeyed encodings, not for breaking real encryption.
Sources
-
Frequency analysis — Wikipedia, for the ciphertext-only attack on substitution ciphers and the use of language letter statistics. ↩