Cryptanalysis of the Sarah2 PenandPaper Cipher
以下为 快照 页面，建议前往来源网站查看，会有更好的阅读体验。
原文链接: https://www.robertxiao.ca/hacking/sarah2/
Disclaimer: Building strong cryptosystems is way harder than breaking them. Props to the system author for going for it and publishing the system – it’s a nice and simple design which is useful for learning cryptographic concepts. I hope this post helps inform future designs!
I recently heard about the Sarah2 penandpaper cipher, touted as a “strong penandpaper cipher” and posted publicly here: http://lasercalcium.glitch.me/ . The cipher is quite simple in order to make it amenable for penandpaper use. When I first heard about the cipher, I immediately started thinking of ways to break it, and a few hours later I had two functional attacks that fully compromise the key with relatively small numbers of chosenplaintext messages (in the best attack, less than 2000 4character messages). This writeup explains the technical details of the attack. Because the cipher is quite simple, the attack is similarly simple to explain, and I hope it makes for a good illustration of a neat cryptanalytic technique.
The Sarah2 Cipher
Sarah2 encrypts strings of characters from the English alphabet, plus the “_” character for spacing and padding, for a total of 27 letters. The secret key is a permutation of all possible letter pairs (an SBox, in cryptography parlance), mapping every pair of letters to another random pair. Thus, there are (27*27)! = 729! possible secret keys. This equates to a 5886.98bit key – quite a bit larger than the tiny 128bit keys commonly used in AES.
Because the cipher operates on pairs of letters, the input must be of even length, and is padded (with a “_” character) if it is not. Then, encryption proceeds as a series of rounds. In each round, every (nonoverlapping) pair of characters is passed through the secret SBox to yield a new pair of characters. Then, the characters are permuted using a kind of “inverse shuffle”, where the characters in the oddnumbered positions (1, 3, 5, …) are collected and then concatenated with the characters in the evennumbered positions (2, 4, 6, …), so that for example the string “12345678” would become “13572468”. This shuffle breaks up all the pairs to provide diffusion in the cipher.
The permutation is omitted in the final round as it is unnecessary (it would add no security and would make the algorithm harder to execute on pen and paper). Decryption simply inverts these steps – the characters are passed through the SBox in reverse and shuffled each round.
Here’s a worked example to make things clearer. The secret SBox we’ll use is
_ a b c d e f g h i j k l m n o p q r s t u v w x y z _ kz gz qf lj yv bz ph fd hg ye qq ze fo ci rh kx lz ec fk mm yx mc is hm ef aj gk a po ii fw vn am hi bt yf eg tr vt tu xt _s oj ny zz ua os mn fm zx qh _d qe ss en b yj si hy vm gq gd kc px jd ka yt ck lf cw tw qb tx rs zn yo gb ap js nv tv fe fc c xi nx xf au it wo bm _h mj qg lu eq ao ed vk pm lp cm cu nz bf vg zu uz av by bl d yl xo ig lw wx ts nr ge eb fs pr dc g_ ij td wa ad eu gj fz dj hp xy zi eo xq ql e wy pp wq qk iy ty su hv es gu ar kf xv sn na e_ cb pt so ab pk pq uo xx ak cg id f op sd qj lt tj cl hx wh cx qp cc xp ga qa io bv gy ke bb mv ym nj u_ rv mh zj st g sc mr _e gx ir r_ hs _i ft fq w_ f_ ee sh sp qn tc oo jl ov li ro bs ot pi sx bp h xe mg fh ys kq ta rc _a ms tm ne jv ds xw xz ni xb mk tb yz on xa mq uk ya xh vs i qy fb ax gh dh sq iz hj uw _r ek rq zq bq bd md db yi q_ zp za hr vu ll ng fl uh j mt jt sa hq xj vy _c ff or pd ws fu uu pg a_ ln cp mb hu nm ls uy ku gg fi hl ik k yy fj iw pw ju oy yk at rz bx rw ky hh dn vf fn rd pb ww ol kk oa zs _t yp iq xl l wt oi tg qr bj kg ba lg _z ev wd tf ha km oz sy te j_ mi vl nb dl lo jz em hc mf m eh _f _x je fg mz hz zf di sj pn up dy vp tk ei wc yr cf y_ wv xk qv _n wm ug af n _q cr as iv gi xr tn ez lm wp gr jy yd _j rg i_ ca mx da uq ch mo cv gp ho s_ lh o vd yh ve yc nh ma bo kv uc le gn wk ti ko yu ow oc zg _w vo zw ns l_ wf fx la jr p rb nn __ b_ cq dg ob vb us jq fa kw gw lx wu wb il gf _g xs du fy hk ul c_ dv rm q nw ps yb rp ac nl zc rx jc qc vc vq ui ht qo to ey gv np z_ aq rj nq pj sg pu mp r jo k_ gs wn kd vw kn h_ jg _y qx py _l wl tt ou dk tz of wg pz ld kb cn lk qm rf s ox ae ie dd gc he t_ ib no go lq ry vr cz de xd az ip ic hb _p m_ rt wr xg zt xm t zm jh qs vj dp re br rr ix vi cd pv sm qi un ks jk ra kh ia dr zr qd kt qz nc bu u sl lr bh ej xu pc oe nf bi do _o jn ud qu bn lv ah hw if be ub _b lb jj jf zk pl v jb ag tq lc iu pe se dz zb gt fp og zl co ok sr ja kr ay kj sv wj od ea um ih kl w x_ p_ ru ut uv nk ux mw o_ ce zd ue dw er hd pa v_ bc in rl wi zo ai yn ji yg my x qw zh el gm rn fv hn jw im _k nd ct ly uf fr bg nt al vx bw bk d_ oq ep n_ dt jx y aa ew dq vz aw me tp xc dx kp yq vh ri cy qt an va gl df sz uj sk _u zy mu vv _m z ur sf ki jp wz pf zv nu xn hf sb et rk jm _v th oh cj ex ml om yw dm sw cs tl we To use this, find the row for the first character, then the column for the second. For example, encrypting “cf” would yield “bm”. Now let’s take a plaintext, like “ attack_at_dawn
” and encrypt it through one round
We first translate each pair “at”, “ta”, “ck”, “_a”, … and translate via the table, to get “ fmjheqgzzmxohd
“. Then, we “inverse shuffle” this to get “ fjegzxhmhqzmod
“.
In the next round, we translate each pair “fj”, “eg”, “zx”, “hm”, … to get “ cchvcsxwmkjmnh
“. Then inverse shuffle to get “ chcxmjncvswkmh
“. You get the hang of it. Visually, on a fourcharacter plaintext:
So, this is a simple cipher indeed. The author makes three recommendations about the number of rounds to use, based on the attack model:
log 2 n rounds: “I’m only encrypting a handful of messages, and adversaries will never get a chance to make me encrypt anything they choose.”
log 2 n+2 rounds: “I’m encrypting tons of messages, but adversaries will still probably not get a chance to make me encrypt anything they choose.”
2 x log 2 n rounds: “My adversary can encrypt and decrypt things using my key, but they don’t know the key.” My expectation is that this provides a fairly strong level of security, even against attacks involving a fair amount of computational power.
http://lasercalcium.glitch.me/Let’s Do The (Electric) Slide Attack
The author notes that, after log 2 n rounds, a oneletter difference will have propagated through to all ciphertext characters. If you play with the cipher a bit, you might feel like it’s pretty strong; after all, if you encrypt attack_at_dawn
you might get _fjqoaprokfpgu
, but attack_at_down
would yield the completely different wmhkpqfpspcrcb
– and that’s after just 4 rounds. Going for the highestsecurity 2 x log 2 n rounds (8, in this case) seems like it should be very secure, indeed, especially if the SBox is properly random. And, indeed, the author explicitly explains that 2 x log 2 n “provides a fairly strong level of security, even against attacks involving a fair amount of computational power”. However, there’s a superpowerful attack which will break this cipher no matter how many rounds are used!
The key insight is that this cipher, at its core, is the same round operation applied a whole bunch of times. If we add a final permutation onto the encryption operation (which we can do, because the permutation doesn’t depend on the key), an n round Sarah2 encryption operation straightforwardly becomes n iterations of the substitution+permutation round. Let’s call the round function F , such that the encryption operation (with extra permutation) is F applied n times: F n .
Suppose that we have the ability to encrypt anything we want but no access to the key (the third scenario outlined above): that is, for any message a of our choosing, we can encrypt it to get b = F n ( a ). Now suppose that, somehow , we’re able to find (or guess) c such that c = F ( a ). In that case, we can ask for the encryption of c , which is d = F n ( c ) = F n+1 ( a ) = F ( F n ( a )) = F ( b ). This seemingly simple relationship is actually extremely powerful – if we know a single relation c = F ( a ) of the round function, we can obtain a second relationship d = F ( b ) which enables us to learn more about the round function. And with that second relationship, we could obtain a third, and so on and so forth. Note that these are relationships on one round of the cipher – not on the full multiround encryption function. This will allow us to attack the round function directly – no matter how many rounds the encryption uses.
This technique is known as a slide attack , since it visually looks like “sliding” the cipher across itself:
Example of sliding the cipher. a = “abcd”, b = F n ( a ) = “fmba”, c = F ( a ) = “fiwt”, d = F n ( c ) = “qsai”.
If we were able to guess that the SBox mapped ab
to fw
and cd
to it
, we could slide to find that fm
mapped to qa
and ba
mapped to si
– doubling the number of key entries we know.
So, if we can encrypt anything we want, we can effectively reduce this n round cipher down to a single round. Is one round of Sarah2 secure? Unfortunately, no. A single round only performs a single substitution through the secret SBox, and a simple fixed permutation.
Now, the only problem left is to guess a single relationship c = F ( a ). Luckily, we can just apply brute force. If the input a consists only of repeats of a single character pair (e.g. ababababab
), one round will only use one entry of the SBox.
Attack 1 – Long Messages
This leads us to attack idea #1: what if we encrypted a really long message, say, 10000 underscores (_)? Since the message has only a single character pair, we know that the first substitution will turn it into xyxyxyxy....xyxyxy
for some xy, and then the first permutation will turn it into xxxxxxxx...yyyyyyyy
. So, we could also encrypt every xxxxxxxx...yyyyyyyy
for each of the 729 (27*27) pairs of characters x, y. How do we know which one is the right match?
If we tack on an extra permutation step to the first encryption (of ___...___
), then the resulting ciphertext will be just one substition step away from the encryption of the correct xxxx...yyyy
. The ciphertexts are basically random, but for longenough messages they must contain duplicate pairs (by the Pigeonhole principle ). For each candidate pairing of ciphertexts, we can check to see if the duplicates line up – if they don’t, then there’s no way they could be separated by a single substitution step.
For example, in the example below the encryption of ___…___ happens to start with the duplicate pair “ei ei” (although in general the duplicate pairs might occur anywhere in the text). The encryption of xx…yy starts with ddxh…, which is impossible since “ei” could not map to both “dd” and “xh”. But, encrypting the correct kk…zz produces a ciphertext that starts with “gu gu”, which is plausible given the “ei ei” in the original ciphertext.
If __ maps to kz, then encrypting ____..____ will produce the same “pattern” of duplicate pairs (e.g. eiei) as encrypting kkkk..zzzz (e.g. gugu).So, by leveraging these duplicate pairs, we can get two ciphertexts that only differ by a single substition step. Since the messages are very long, we expect that almost all possible letter pairs exist, from which we can easily reconstruct the whole table.
With 10000 characters (5000 letter pairs), the probability that the ciphertext does not contain all 729 letterpair combinations is (728/729)^5000 = 0.001; that is, with 99.9% probability the ciphertext contains every possible letter pair and therefore we get the entire secret SBox. This only requires us to encrypt at most 730 messages of 10000 characters, or around 7,300,000 characters.
Attack 2 – Short Messages
We can also pull off the slide attack even if we restrict ourselves to short messages. Let’s restrict ourselves to 8character (4 letter pair) messages. What we’ll do is obtain the encryptions of every message of the form abababab and every message of the form aaaabbbb – a total of 1431 messages (skipping duplicates of the form aaaaaaaa ).
Let’s let A(ab) denote the encryption of “ abababab
” with an extra permutation (as with the long message attack), and B(ab) denote the encryption of “ aaaabbbb
“. Let S denote the secret substitution function; for example, S(abcd) = fwit for the Sbox above. Then, for any letterpair X , we have S(A(X)) = B(S(X)) – that is, A(X) and B(S(X)) are related through a single substitution (using the slide attack). All we need to do is figure out the pairings between A and B messages.
As with the long message attack, we can do this by looking for messages with duplicated letter pairs. For example, if A(ic) = cfcftlvc and B(gh) = bmbmsmlc , and no other B messages have this pattern (the same pair twice, followed by two different pairs), then we can be certain that S(ic) = gh , S(cf) = bm , S(tl) = sm and S(vc) = lc . Once we learn a new pair, such as S(cf) = bm , we can look at A(cf) and B(bm) to get even more pairs, and this way iteratively derive the entire key.
The probability that any A message has no duplicated letter pairs is (728/729)^4 = 0.9945, so the probability that every A message has no duplicate pairs is 0.9945^729 = 0.018. Therefore, this particular attack strategy succeeds over 98% of the time. We could improve this by using longer messages (e.g. with 16 characters it would succeed 99.9% of the time). We could also use a more bruteforce approach where we guess pairings, propagating the resulting new pairings and continuing until we get a full key or hit a contradiction and backtrack.
This attack only requires 1431*8 = 11448 characters to be encrypted, or around 54,434 bits of information. Given that the key itself is around 5887 bits in size, this equates to only needing around 9 bits of information per bit of key – not a bad ratio! The theoretically optimal attack would require observing 5887 bits of encrypted (or decrypted) data, so this is pretty close.
Improving the Cipher
As we can see, the slide attack is a pretty clever attack that works whenever you iterate the same function over and over again in a cipher. So, one of the best ways of dealing with this is simply to add a little bit of perround variability. The AES cipher and its predecessor DES both use the concept of subkeys – the main key is used to derive a series of subkeys using a key schedule , and each round uses a different subkey. This way, each round function is slightly different, which is enough to defeat the slide attack.
In Sarah2, this might be achieved by using the SBox to derive some perround modifications to use. As an illustrative (and perhaps not very strong) example, you could interpret a letter pair like “kz” to mean a Caesar shift (mapping k to z, l to a, m to b, etc.), and apply one Caesar shift per round according to a particular entry in the SBox (e.g. the “__” entry for round 1, the “aa” entry for round 2, etc.).
The LC4 penandpaper cipher uses a similar secret SBox, albeit a much smaller one (6×6). It encrypts characters one at a time (a stream cipher ), and moves the entries of the SBox around after encrypting each character. This ensures that the SBox changes over the course of encryption rather than remaining static. A similar approach of shifting the entries in the Sarah2 SBox after each round could help mitigate the slide attack.
Summary and Conclusions
Above, I’ve outlined two attacks on Sarah2 which reveal the entire secret key, under the assumption that the attacker can encrypt messages of their choice. While these attacks don’t break the cipher if the attacker can only passively observe messages, they should be taken as a strong indication that the Sarah2 cipher is not safe or secure. The slide attack principle underpinning these attacks means that the number of rounds is immaterial; these attacks work even if you were to do thousands of rounds, demonstrating that “confusion and diffusion” are not the only requirements for building solid cryptosystems.
The aim of this post has been to illustrate a practical slide attack on a simple, understandable cipher, and I hope it serves as a useful tool for learning more about this type of attack. The attacks aren’t hard to implement – I strongly recommend implementing them yourself if you want to understand the slide attack better! For reference, my implementations in Python can be found in this GitHub repo .
猜你喜欢

39
Patterson's Cipher for Jefferson  Challenge Solved After 200 Years In 1803, President Thomas Jefferson sent Robert R. Livingston, minister in Paris, a cipher based on new principles which he found "the easies...

3
wizpen A Pigpen cipher inspired font, originally created for a D&D wizard spellbook. Unlike Pigpen, characters in this font can share vertical str...

79
Contribute to PaperLauncher development by creating an account on GitHub.

31
Run Javascript programs on pieces of paper, using a projector and camera. Physically hold programs in your hand, and see them come to life, as if by magic.

48
Introduction I read a lot of deep learning papers, typically a few/week. I've read probably several thousands of papers. My general problem with papers in machine learning or deep learning is that often they si...

38
Rock Paper Scissors Lizard Spock Game is an extended version of the classical Rock Paper Scissors game first mentioned in the Season 2 of “Big Bang Theory” by Sheldon Cooper.

43
This week I attended linux.conf.au (for the first time) in Christchurch, New Zealand. It’s a week long conference covering Linux, open open source software and hardware, privacy,...

30
Cristiano Calcagno, Dino Distefano, and Peter W. O’Hearn (three of the Facebook engineers behindInfer) and Hongseok Yang of KAIST received this year’s ACM SIGPLAN Most Influen...

25
README.md SGXROP: Practical Enclave Malware with Intel SGX This repository contains the implementations of the paper "Practical Enclave Malware with I...

12
This is my notes on the paper: Amazon Aurora: Design Considerations for High Throughput CloudNative Relational Databases . What's A...
 上一页
 下一页
关于极客头条
聚合每日国内外有价值，有趣的链接。