About
This post is part of a series in which I’ll post writeups for all challenges of the OverTheWire Advent Bonanza 2019 CTF. To see the intro, click here
Overview
We’ve got a program, which seems to just hang when executed. It also contains the flag in some way. The file command yields the following:
$ file chal chal: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=1ffb2265dbdb56983b756ab3693df8ef228d6b60, not stripped
A quick look at SSE
The challenge name clearly hints at SSE being used. But some of you might be wondering, what SSE means. It stands for Streaming SIMD Extensions and it’s a processor extension which adds 128 bit registers and better floating point arithmetic. There are 16 of those 128 bit registers, and each of them can be used like the following:
- As 2x 64 bit floats (double)
- As 2x 64 bit ints (quad words)
- As 4x 32 bit floats (float)
- As 4x 32 bit ints (double words)
- As 8x 16 bit ints (words)
- As 16x 8 bit ints (bytes)
We only need the 4x 32 bit integers, because that’s the only way the program uses them. Keep in mind that every operation on a 128 bit register affects each 32 bit integer individually!
Reversing
So let’s start reversing! The first surprise happens when opening the program in Ghidra: It can’t decompile the main function!

So we need to actually reverse the assembly code by hand. And it’s huge! Here’s the function graph:

Good thing is that it doesn’t call any additional functions. So after one hour of reversing, I got the following pseudo-code:
xmm3 = 1, 0, 0, 0 xmm2 = 0, 1, 0, 0 xmm1 = 0, 0, 1, 0 xmm0 = 0, 0, 0, 1 for(int i = 0; i < 0x112210f47de98115; i++){ xmm3_t = 4 * xmm0 + 3 * xmm1 + 2 * xmm2 + 1 * xmm3 xmm2_t = 8 * xmm0 + 7 * xmm1 + 6 * xmm2 + 5 * xmm3 xmm1_t = 12 * xmm0 + 11 * xmm1 + 10 * xmm2 + 9 * xmm3 xmm0_t = 16 * xmm0 + 15 * xmm1 + 14 * xmm2 + 13 * xmm3 xmm3 = xmm3_t xmm2 = xmm2_t xmm1 = xmm1_t xmm0 = xmm0_t for(int ii = 0; i < 0x3e8; i++) { xmm3 -= 0 if xmm3 < 0x96433d else 0x96433d xmm2 -= 0 if xmm2 < 0x96433d else 0x96433d xmm1 -= 0 if xmm1 < 0x96433d else 0x96433d xmm0 -= 0 if xmm0 < 0x96433d else 0x96433d } } //flag is a array of 16 words (32 bytes) flag = <encrypted data> flag[0] ^= xmm3[0] & 0xffff flag[1] ^= xmm3[1] & 0xffff flag[2] ^= xmm3[2] & 0xffff flag[3] ^= xmm3[3] & 0xffff flag[4] ^= xmm2[0] & 0xffff flag[5] ^= xmm2[1] & 0xffff flag[6] ^= xmm2[2] & 0xffff flag[7] ^= xmm2[3] & 0xffff flag[8] ^= xmm1[0] & 0xffff flag[9] ^= xmm1[1] & 0xffff flag[10] ^= xmm1[2] & 0xffff flag[11] ^= xmm1[3] & 0xffff flag[12] ^= xmm0[0] & 0xffff flag[13] ^= xmm0[1] & 0xffff flag[14] ^= xmm0[2] & 0xffff flag[15] ^= xmm0[3] & 0xffff printf("AOTW{%s}\n", flag)
Still a lot, but now it’s readable! But there’s still a problem: The for loop, which runs 0x112210f47de98115 times. So we now know why the program hangs when run. But still, we can now analyse the program way easier!
Analyzing some math
You might have realized the pseuo-code in the loop looks like a Matrix multiplication! We can see that in xmm0 to xmm3, the following matrix is stored:
M=\begin{bmatrix} xmm3_1 & xmm3_2 & xmm3_3 & xmm3_4 \\ xmm2_1 & xmm2_2 & xmm2_3 & xmm2_4 \\ xmm1_1 & xmm1_2 & xmm1_3 & xmm1_4 \\ xmm0_1 & xmm0_2 & xmm0_3 & xmm0_4 \end{bmatrix}At the start, M is the identity matrix I:
M=I=\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}And each loop iteration, the following calculation is performed:
M=\begin{bmatrix} 1 & 2 & 3 & 4 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \\ \end{bmatrix}*MBut what’s with the code after the multiplication? Let’s look at it a bit closer:
for(int ii = 0; i < 0x3e8; i++) { xmm3 -= 0 if xmm3 < 0x96433d else 0x96433d xmm2 -= 0 if xmm2 < 0x96433d else 0x96433d xmm1 -= 0 if xmm1 < 0x96433d else 0x96433d xmm0 -= 0 if xmm0 < 0x96433d else 0x96433d }
We can actually rephrase that:
for(int ii = 0; i < 0x3e8; i++) { if xmm3 >= 0x96433d: xmm3 -= 0x96433d if xmm2 >= 0x96433d: xmm2 -= 0x96433d if xmm1 >= 0x96433d: xmm1 -= 0x96433d if xmm0 >= 0x96433d: xmm0 -= 0x96433d }
Each iteration, it checks if any element in the matrix is greater than 0x96433d (9847613000), and if it is, it subtracts 0x96433d. Let’s see how much it can subtract at most in 0x3e8 (1000) iterations:
9847613*1000=9847613000>4294967296That’s more than the integer limit (4294967296)! So after the loop every element in the matrix is below 0x96433d, so it’s a mod operation! Let’s put that into our equation from above:
M=\Bigg(\begin{bmatrix} 1 & 2 & 3 & 4 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \\ \end{bmatrix}*M\Bigg)\mod 9847613The loop runs 0x112210f47de98115 (a lot) times. You might have noticed that we can write this as an exponentiation:
M=\begin{bmatrix} 1 & 2 & 3 & 4 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \\ \end{bmatrix}^{\mathtt{0x112210f47de98115}} \mod 9847613We only need to solve this now…
The “expmod” algorithm
You might wonder why we take this sidetrack and look into the expmod algorithm. But it’ll all make sense in the end. Let’s at first write down our situation:
r=b^e \mod mWe don’t wan’t to wast computational power and calculate a huge number, just to take it mod m. So let’s break the above equation down a little: We now the following equations hold:
a^{b+c}=a^b*a^c a^{b*c}={a^b}^c={a^c}^bAlso we now that we can write every number in binary like this, were a stands for the bits of n:
n=a_0*2^{0}+a_1*2^{1}+…+a_i*2^{i}Let’s combine all of those equations, but ignoring the mod m (and now a stands for the bits of e):
b^e=b^{a_0*2^{0}+a_1*2^{1}+…+a_i*2^{i}}=e^{a_0*2^{0}}*e^{a_1*2^{1}}*…*e^{a_i*2^{i}}={b^{2^0}}^{a0}*{b^{2^1}}^{a_1}*…*{b^{2^i}}^{a_i}Because we know that the following equations hold:
n^0=1 n*1=nWe can actually ignore every bit which is 0, because:
a_i=0 \implies {b^{2^i}}^{a_i}={b^{2^i}}^0=1 \implies b^e=…*{b^{2^i}}^{a_i}*…=…*1*…=…*…And because of the following equation:
b^{2^i}=b^{2^{i-1}*2}={b^{2^{i-1}}}^2We can see that we can just square the base for the last bit to get the base for this bit! This results in the following pseudo-code:
function expmod(b, e, m): r = 1 while e > 0: if e & 1 == 1: r = (r * b) mod m e = e >> 1 b = (b * b) mod m return r
This code has complexity O(log e) instead of O(e) (for regular pow + mod)! This is great! But what does it have to do with this challenge? Remember from above, that for expmod to work, the following equations must hold:
a^{b+c}=a^b*a^c a^{b*c}={a^b}^c={a^c}^bAs it turns out, those equations also hold for matrices! So we can use expmod on matrices, which allows to solve this monstrosity from above in about 61 steps:
\log_2 \mathtt{0x112210f47de98115} \approx 60,098709106 \approx 61Implementing modexp
So let’s start by writing an modexp implementation in python:
def expmod(b, e, m): r = 1 while e > 0: if e & 1 == 1: r = (r * b) % m e = e >> 1 b = (b ** 2) % m return r
Not many changes had to been made to the pseudo-code. Let’s try it out with an RSA example which I found here:
m = 1976620216402300889624482718775150 e = 65537 N = 145906768007583323230186939349070635292401872375357164399581871019873438799005358938369571402670149802121818086292467422828157022922076746906543401224889672472407926969987100581290103199317858753663710862357656510507883714297115637342788911463535102712032765166518411726859837988672111837205085526346618740053 r = expmod(m, e, N) print(r) #Should be 35052111338673026690212423937053328511880760811579981620642802346685810623109850235943049080973386241113784040794704193978215378499765413083646438784740952306932534945195080183861574225226218879827232453912820596886440377536082465681750074417459151485407445862511023472235560823053497791518928820272257787786 assert r == 35052111338673026690212423937053328511880760811579981620642802346685810623109850235943049080973386241113784040794704193978215378499765413083646438784740952306932534945195080183861574225226218879827232453912820596886440377536082465681750074417459151485407445862511023472235560823053497791518928820272257787786 print("Works!")
$ python solver.py 35052111338673026690212423937053328511880760811579981620642802346685810623109850235943049080973386241113784040794704193978215378499765413083646438784740952306932534945195080183861574225226218879827232453912820596886440377536082465681750074417459151485407445862511023472235560823053497791518928820272257787786 Works!
Now here’s the cool part: We don’t need to change anything to make it work with matrices, because python doesn’t care if the type is an integer, a matrix, or even a completely different class, as long as it implements the ‘multiply’ operator! So we can move on solving the giant equation.
Solving the equation
To do this we first store our parameters somewhere. I use the numpy library for the matrices handling:
import numpy as np b = np.matrix([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]) e = 0x112210f47de98115 m = 9847613
We then run our expmod algorithm:
r = expmod(b, e, m) print(r)
Then we extract the XOR key:
from struct import pack p16 = lambda x: pack("<H", x) key = b"" for i in range(4 * 4): key += p16(r.item(i) & 0xffff) print("XOR key:") print(key.hex())
After that we decrypt the flag:
p8 = lambda x: pack("<B", x) flag_enc = bytes.fromhex("fc14eb09bcaee7474fe37cc152a5028e8971c88d9623016d71405aeafd461d23") flag_dec = b"" for i in range(len(flag_enc)): flag_dec += p8(flag_enc[i] ^ key[i]) flag = flag_dec.decode("ascii")
And then finally we print it:
print("Flag:") print("AOTW{%s}" % flag)
When we run it we get:
$ python solver.py expmod result: [[7938225 4160415 382605 6452408] [8295223 6860620 5426017 3991414] [8652221 9560825 621816 1530420] [9009219 2413417 5665228 8917039]] XOR key: b1209f7b8dd6b87437934caf61cb76e7bd05f9e2f87c345a437869d3cc712f10 Flag: AOTW{M4tr1x_3xp0n3nti4t1on_5728391723}
The flag is:
AOTW{M4tr1x_3xp0n3nti4t1on_5728391723}
Here’s the complete solver script:
#!/usr/bin/python def expmod(b, e, m): r = 1 while e > 0: if e & 1 == 1: r = (r * b) % m e = e >> 1 b = (b ** 2) % m return r import numpy as np #params b = np.matrix([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]) e = 0x112210f47de98115 m = 9847613 #expmod time! r = expmod(b, e, m) print("expmod result:") print(r) #extract the xor key from struct import pack p16 = lambda x: pack("<H", x) key = b"" for i in range(4 * 4): key += p16(r.item(i) & 0xffff) print("XOR key:") print(key.hex()) #decrypt the flag p8 = lambda x: pack("<B", x) flag_enc = bytes.fromhex("fc14eb09bcaee7474fe37cc152a5028e8971c88d9623016d71405aeafd461d23") flag_dec = b"" for i in range(len(flag_enc)): flag_dec += p8(flag_enc[i] ^ key[i]) flag = flag_dec.decode("ascii") #print the flag print("Flag:") print("AOTW{%s}" % flag)
Cool, a Great overview about the Christmas Challenge. Looking forward to your next posting. CU