Right Now | Coded to Last
On Christmas Day, 1801, Thomas Jefferson, then president of both the United States of America and the American Philosophical Society, received a letter from the society’s vice president, Robert Patterson, a frequent correspondent who was a mathematics professor at the University of Pennsylvania. Patterson began by deﬁning four requirements of what he called a “perfect cypher.” It should work in any language, be easy to memorize, and be simple to perform. Most important, an ideal cipher should be “absolutely inscrutable to all unacquainted with the particular key or secret for decyphering.”
Patterson described a technique that met his criteria, and gave an example. “I shall conclude this paper with a specimen of such writing,” he boasted, “which I may safely defy the united ingenuity of the whole human race to decypher to the end of time….” Indeed, by all accounts, neither Jefferson nor anyone else could break Patterson’s challenge cipher for the next two centuries.
Until now. In a recent article in American Scientist, and in a talk at Harvard sponsored by the mathematics department, Lawren Smithline ’94 explained how he decoded Patterson’s cryptic message in 2007. Smithline, a mathematician at the Center for Communications Research (in Princeton, New Jersey; a division of the Institute for Defense Analyses), used methods that were available in the early nineteenth century--if beyond that era’s mathematical intuitions--though he accelerated the computations with computers.
A coding technique like a simple substitution cipher consistently replaces one letter of the alphabet with another. The Caesar cipher, for example, shifts each letter a ﬁxed number of places ahead in the alphabet--say, three--and XYZ becomes ABC. But since at least the ﬁfteenth century, cryptographers have realized that simple substitution ciphers are vulnerable to frequency analysis. For example, “e” is the most frequently used letter in English. In a sufficiently long substitution cipher, whatever letter appears most often probably substitutes for “e.” Letter counts suggest a limited set of choices to try for the most commonly used letters.
In his more sophisticated code, Patterson wrote his message openly, without capitals or spaces, but vertically on ruled paper, “in the Chinese manner,” in columns from left to right. This produces a grid of lowercase letters that are gibberish when read left to right, but a perfectly clear message when read in columns. Next he broke this grid into sections of up to nine lines each, numbering each line 1, 2, 3, etc., and re-ordering them randomly within the section--though all sections would repeat the same reordered sequence of numbers. He also inserted up to nine arbitrary letters at the beginning of each line, which had no bearing on the message content but drastically increased the inscrutability factor. He ﬁlled vacant spaces at the end of the line with similarly random letters.
“It will be absolutely impossible, even for one perfectly acquainted with the general system, ever to decypher the writing of another without his key,” Patterson wrote. He estimated the number of possible keys at more than “ninety millions of millions.” The cipher’s effectiveness so strongly impressed Jefferson that he forwarded it to his ambassador to France, Robert Livingston, who nonetheless persisted in an older nomenclator code, based on a catalog of numbers representing words or phrases.
The 200-year-old code began to intrigue Smithline when his neighbor Amy Speckart, A.M. ’96, who worked at The Papers of Thomas Jefferson, a decades-long project based at Princeton University and Monticello, told him of Patterson’s letter and its challenge cipher, which the curators could not read. Though single-letter frequencies wouldn’t help break the code, Smithline felt that digraph frequency analysis--the likelihood of speciﬁc pairs of letters appearing together--might. He made a 26-by-26 table counting the frequencies of “aa,” “ab,” “ac,” through “zz,” using the 80,000 letters in Jefferson’s State of the Union addresses. Smithline then guessed at ﬁve things: the number of rows in a section, two rows that belong next to each other, and the number of extra letters inserted at the start of those two rows.
The digraph table helped evaluate those guesses. “For instance, the letter pair ‘vj’ is impossible in English, so that excludes any alignment that creates that digraph,” Smithline wrote. “Alternatively, the letter pair ‘qu’ is rare, but when there is a ‘q,’ it must line up with a ‘u.’ When ‘q’ and ‘u’ do line up, that is strong evidence in favor of that alignment.” Lastly, he applied dynamic programming--a technique used today in computational biology to ﬁnd, for example, similar regions in two DNA base sequences--to statistically identify top-scoring guesses on section size, row pairs, and extra letters. (The dynamic program works despite signiﬁcant errors in transcribing the handwritten cipher to typed characters.) Certain constraints in Patterson’s cipher, Smithline wrote, “reduced the overall computational load to fewer than 100,000 simple sums--tedious in the nineteenth century, but doable.”
This analysis allowed Smithline to decrypt the challenge cipher that had held its message inviolate for more than two centuries. Had Jefferson cracked the code, he quite likely would have divined the entire message from its ﬁrst few words: “In Congress, July Fourth,….”--the preamble to the Declaration of Independence, from Jefferson’s own hand.