OverTheWire Advent Bonanza 2019: [Day 10] ChristmasSSE Keygen

About

This post is part of a series in which I’ll post writeups for all challenges of the OverTheWire Advent Bonanza 2019 CTF. To see the intro, click here

Overview

We’ve got a program, which seems to just hang when executed. It also contains the flag in some way. The file command yields the following:

$ file chal
chal: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=1ffb2265dbdb56983b756ab3693df8ef228d6b60, not stripped

A quick look at SSE

The challenge name clearly hints at SSE being used. But some of you might be wondering, what SSE means. It stands for Streaming SIMD Extensions and it’s a processor extension which adds 128 bit registers and better floating point arithmetic. There are 16 of those 128 bit registers, and each of them can be used like the following:

  • As 2x 64 bit floats (double)
  • As 2x 64 bit ints (quad words)
  • As 4x 32 bit floats (float)
  • As 4x 32 bit ints (double words)
  • As 8x 16 bit ints (words)
  • As 16x 8 bit ints (bytes)

We only need the 4x 32 bit integers, because that’s the only way the program uses them. Keep in mind that every operation on a 128 bit register affects each 32 bit integer individually!

Reversing

So let’s start reversing! The first surprise happens when opening the program in Ghidra: It can’t decompile the main function!

Well shit...

So we need to actually reverse the assembly code by hand. And it’s huge! Here’s the function graph:

Well, that's huge....

Good thing is that it doesn’t call any additional functions. So after one hour of reversing, I got the following pseudo-code:

xmm3 = 1, 0, 0, 0
xmm2 = 0, 1, 0, 0
xmm1 = 0, 0, 1, 0
xmm0 = 0, 0, 0, 1

for(int i = 0; i < 0x112210f47de98115; i++){
	xmm3_t = 4 * xmm0 + 3 * xmm1 + 2 * xmm2 + 1 * xmm3
	xmm2_t = 8 * xmm0 + 7 * xmm1 + 6 * xmm2 + 5 * xmm3
	xmm1_t = 12 * xmm0 + 11 * xmm1 + 10 * xmm2 + 9 * xmm3
	xmm0_t = 16 * xmm0 + 15 * xmm1 + 14 * xmm2 + 13 * xmm3
	
	xmm3 = xmm3_t
	xmm2 = xmm2_t
	xmm1 = xmm1_t
	xmm0 = xmm0_t
	
	for(int ii = 0; i < 0x3e8; i++) {
		xmm3 -= 0 if xmm3 < 0x96433d else 0x96433d
		xmm2 -= 0 if xmm2 < 0x96433d else 0x96433d
		xmm1 -= 0 if xmm1 < 0x96433d else 0x96433d
		xmm0 -= 0 if xmm0 < 0x96433d else 0x96433d
	}
} 

//flag is a array of 16 words (32 bytes)
flag = <encrypted data>

flag[0] ^= xmm3[0] &amp; 0xffff
flag[1] ^= xmm3[1] &amp; 0xffff
flag[2] ^= xmm3[2] &amp; 0xffff
flag[3] ^= xmm3[3] &amp; 0xffff
flag[4] ^= xmm2[0] &amp; 0xffff
flag[5] ^= xmm2[1] &amp; 0xffff
flag[6] ^= xmm2[2] &amp; 0xffff
flag[7] ^= xmm2[3] &amp; 0xffff
flag[8] ^= xmm1[0] &amp; 0xffff
flag[9] ^= xmm1[1] &amp; 0xffff
flag[10] ^= xmm1[2] &amp; 0xffff
flag[11] ^= xmm1[3] &amp; 0xffff
flag[12] ^= xmm0[0] &amp; 0xffff
flag[13] ^= xmm0[1] &amp; 0xffff
flag[14] ^= xmm0[2] &amp; 0xffff
flag[15] ^= xmm0[3] &amp; 0xffff

printf("AOTW{%s}\n", flag)

Still a lot, but now it’s readable! But there’s still a problem: The for loop, which runs 0x112210f47de98115 times. So we now know why the program hangs when run. But still, we can now analyse the program way easier!

Analyzing some math

You might have realized the pseuo-code in the loop looks like a Matrix multiplication! We can see that in xmm0 to xmm3, the following matrix is stored:

M=\begin{bmatrix} xmm3_1 & xmm3_2 & xmm3_3 & xmm3_4 \\ xmm2_1 & xmm2_2 & xmm2_3 & xmm2_4 \\ xmm1_1 & xmm1_2 & xmm1_3 & xmm1_4 \\ xmm0_1 & xmm0_2 & xmm0_3 & xmm0_4 \end{bmatrix}

At the start, M is the identity matrix I:

M=I=\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}

And each loop iteration, the following calculation is performed:

M=\begin{bmatrix} 1 & 2 & 3 & 4 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \\ \end{bmatrix}*M

But what’s with the code after the multiplication? Let’s look at it a bit closer:

for(int ii = 0; i < 0x3e8; i++) {
	xmm3 -= 0 if xmm3 < 0x96433d else 0x96433d
	xmm2 -= 0 if xmm2 < 0x96433d else 0x96433d
	xmm1 -= 0 if xmm1 < 0x96433d else 0x96433d
	xmm0 -= 0 if xmm0 < 0x96433d else 0x96433d
}

We can actually rephrase that:

for(int ii = 0; i < 0x3e8; i++) {
	if xmm3 >= 0x96433d: xmm3 -= 0x96433d
	if xmm2 >= 0x96433d: xmm2 -= 0x96433d
	if xmm1 >= 0x96433d: xmm1 -= 0x96433d
	if xmm0 >= 0x96433d: xmm0 -= 0x96433d
}

Each iteration, it checks if any element in the matrix is greater than 0x96433d (9847613000), and if it is, it subtracts 0x96433d. Let’s see how much it can subtract at most in 0x3e8 (1000) iterations:

9847613*1000=9847613000>4294967296

That’s more than the integer limit (4294967296)! So after the loop every element in the matrix is below 0x96433d, so it’s a mod operation! Let’s put that into our equation from above:

M=\Bigg(\begin{bmatrix} 1 & 2 & 3 & 4 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \\ \end{bmatrix}*M\Bigg)\mod 9847613

The loop runs 0x112210f47de98115 (a lot) times. You might have noticed that we can write this as an exponentiation:

M=\begin{bmatrix} 1 & 2 & 3 & 4 \\ 4 & 5 & 6 & 7 \\ 8 & 9 & 10 & 11 \\ 12 & 13 & 14 & 15 \\ \end{bmatrix}^{\mathtt{0x112210f47de98115}} \mod 9847613

We only need to solve this now…

The “expmod” algorithm

You might wonder why we take this sidetrack and look into the expmod algorithm. But it’ll all make sense in the end. Let’s at first write down our situation:

r=b^e \mod m

We don’t wan’t to wast computational power and calculate a huge number, just to take it mod m. So let’s break the above equation down a little: We now the following equations hold:

a^{b+c}=a^b*a^c a^{b*c}={a^b}^c={a^c}^b

Also we now that we can write every number in binary like this, were a stands for the bits of n:

n=a_0*2^{0}+a_1*2^{1}+…+a_i*2^{i}

Let’s combine all of those equations, but ignoring the mod m (and now a stands for the bits of e):

b^e=b^{a_0*2^{0}+a_1*2^{1}+…+a_i*2^{i}}=e^{a_0*2^{0}}*e^{a_1*2^{1}}*…*e^{a_i*2^{i}}={b^{2^0}}^{a0}*{b^{2^1}}^{a_1}*…*{b^{2^i}}^{a_i}

Because we know that the following equations hold:

n^0=1 n*1=n

We can actually ignore every bit which is 0, because:

a_i=0 \implies {b^{2^i}}^{a_i}={b^{2^i}}^0=1 \implies b^e=…*{b^{2^i}}^{a_i}*…=…*1*…=…*…

And because of the following equation:

b^{2^i}=b^{2^{i-1}*2}={b^{2^{i-1}}}^2

We can see that we can just square the base for the last bit to get the base for this bit! This results in the following pseudo-code:

function expmod(b, e, m):
	r = 1
	
	while e > 0:
		if e &amp; 1 == 1:
			r = (r * b) mod m
		
		e = e >> 1
		b = (b * b) mod m

	return r

This code has complexity O(log e) instead of O(e) (for regular pow + mod)! This is great! But what does it have to do with this challenge? Remember from above, that for expmod to work, the following equations must hold:

a^{b+c}=a^b*a^c a^{b*c}={a^b}^c={a^c}^b

As it turns out, those equations also hold for matrices! So we can use expmod on matrices, which allows to solve this monstrosity from above in about 61 steps:

\log_2 \mathtt{0x112210f47de98115} \approx 60,098709106 \approx 61

Implementing modexp

So let’s start by writing an modexp implementation in python:

def expmod(b, e, m):
	r = 1
	
	while e > 0:
		if e &amp; 1 == 1:
			r = (r * b) % m
		
		e = e >> 1
		b = (b ** 2) % m
	
	return r

Not many changes had to been made to the pseudo-code. Let’s try it out with an RSA example which I found here:

m = 1976620216402300889624482718775150
e = 65537
N = 145906768007583323230186939349070635292401872375357164399581871019873438799005358938369571402670149802121818086292467422828157022922076746906543401224889672472407926969987100581290103199317858753663710862357656510507883714297115637342788911463535102712032765166518411726859837988672111837205085526346618740053

r = expmod(m, e, N)
print(r)

#Should be 35052111338673026690212423937053328511880760811579981620642802346685810623109850235943049080973386241113784040794704193978215378499765413083646438784740952306932534945195080183861574225226218879827232453912820596886440377536082465681750074417459151485407445862511023472235560823053497791518928820272257787786
assert r == 35052111338673026690212423937053328511880760811579981620642802346685810623109850235943049080973386241113784040794704193978215378499765413083646438784740952306932534945195080183861574225226218879827232453912820596886440377536082465681750074417459151485407445862511023472235560823053497791518928820272257787786

print("Works!")
$ python solver.py 
35052111338673026690212423937053328511880760811579981620642802346685810623109850235943049080973386241113784040794704193978215378499765413083646438784740952306932534945195080183861574225226218879827232453912820596886440377536082465681750074417459151485407445862511023472235560823053497791518928820272257787786
Works!

Now here’s the cool part: We don’t need to change anything to make it work with matrices, because python doesn’t care if the type is an integer, a matrix, or even a completely different class, as long as it implements the ‘multiply’ operator! So we can move on solving the giant equation.

Solving the equation

To do this we first store our parameters somewhere. I use the numpy library for the matrices handling:

import numpy as np

b = np.matrix([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
e = 0x112210f47de98115
m = 9847613

We then run our expmod algorithm:

r = expmod(b, e, m)
print(r)

Then we extract the XOR key:

from struct import pack

p16 = lambda x: pack("<H", x)

key = b""

for i in range(4 * 4):
	key += p16(r.item(i) &amp; 0xffff)

print("XOR key:")
print(key.hex())

After that we decrypt the flag:

p8 = lambda x: pack("<B", x)

flag_enc = bytes.fromhex("fc14eb09bcaee7474fe37cc152a5028e8971c88d9623016d71405aeafd461d23")
flag_dec = b""

for i in range(len(flag_enc)):
	flag_dec += p8(flag_enc[i] ^ key[i])

flag = flag_dec.decode("ascii")

And then finally we print it:

print("Flag:")
print("AOTW{%s}" % flag)

When we run it we get:

$ python solver.py 
expmod result:
[[7938225 4160415  382605 6452408]
 [8295223 6860620 5426017 3991414]
 [8652221 9560825  621816 1530420]
 [9009219 2413417 5665228 8917039]]
XOR key:
b1209f7b8dd6b87437934caf61cb76e7bd05f9e2f87c345a437869d3cc712f10
Flag:
AOTW{M4tr1x_3xp0n3nti4t1on_5728391723}

The flag is:

AOTW{M4tr1x_3xp0n3nti4t1on_5728391723}

Here’s the complete solver script:

#!/usr/bin/python

def expmod(b, e, m):
	r = 1
	
	while e > 0:
		if e &amp; 1 == 1:
			r = (r * b) % m
		
		e = e >> 1
		b = (b ** 2) % m
	
	return r


import numpy as np

#params
b = np.matrix([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]])
e = 0x112210f47de98115
m = 9847613

#expmod time!
r = expmod(b, e, m)
print("expmod result:")
print(r)

#extract the xor key
from struct import pack

p16 = lambda x: pack("<H", x)

key = b""

for i in range(4 * 4):
	key += p16(r.item(i) &amp; 0xffff)

print("XOR key:")
print(key.hex())

#decrypt the flag
p8 = lambda x: pack("<B", x)

flag_enc = bytes.fromhex("fc14eb09bcaee7474fe37cc152a5028e8971c88d9623016d71405aeafd461d23")
flag_dec = b""

for i in range(len(flag_enc)):
	flag_dec += p8(flag_enc[i] ^ key[i])

flag = flag_dec.decode("ascii")

#print the flag
print("Flag:")
print("AOTW{%s}" % flag)

One thought on “OverTheWire Advent Bonanza 2019: [Day 10] ChristmasSSE Keygen

Leave a Reply

Your email address will not be published. Required fields are marked *