Frequency analysis

Also known as: letter-frequency analysis

Frequency analysis breaks a substitution cipher by exploiting the fact that letters and symbols of a natural language occur with characteristic, uneven frequencies that survive a simple letter-for-letter substitution.¹

Ciphertext symbol counts mirror the plaintext language's letter frequencies, leaking the substitution.

How it works

In a monoalphabetic substitution cipher each plaintext letter is consistently replaced by one ciphertext symbol. The substitution hides which symbol stands for which letter, but it does not change how often each appears: if the most common ciphertext symbol shows up about as often as the most common letter does in the plaintext language, those two probably match. The analyst tabulates symbol frequencies, aligns them with known language statistics, and confirms guesses using common pairs, doublets, and short words. Because it needs only intercepted ciphertext, it is the archetypal ciphertext-only attack.

Modern ciphers are built specifically to defeat this. Strong diffusion and confusion spread each plaintext symbol’s influence across the whole output, so a fixed S-box used inside many rounds — rather than as a single standalone substitution — flattens the output statistics. A one-time pad defeats frequency analysis completely: its output is uniform by construction.

Relevance to SDR

Frequency analysis does not apply to the encrypted voice GopherTrunk encounters — keyed AES and DES produce statistically flat output by design. Its relevance is to obfuscation, not encryption. When reverse-engineering an unknown, keyless transform clean-room (as in the talker-alias work, issue #773), counting how often each byte value appears is one of the first diagnostics: a non-flat distribution betrays a simple substitution and helps recover a fixed lookup table, whereas a flat distribution suggests a stronger construction. The technique is a tool for understanding unkeyed encodings, not for breaking real encryption.

Sources

Frequency analysis — Wikipedia, for the ciphertext-only attack on substitution ciphers and the use of language letter statistics. ↩

How it works

Relevance to SDR

Sources

See also

Join the GopherTrunk community