This challenge was part of the 2022 US Cyber Open. I played the 2023 game this year and decided to go back and look at some of last year’s challenges. I found one that caught my eye and it looked like a problem for z3. This was perfect because I have been looking for some more introductory challenges requiring z3 so I could get some practice with it.
Radiation Leak
Description
For the past month, your keylogger has been successfully stealing the passcodes for this admin portal (https://metaproblems.com/6f5fa97da7de2648c1531316ebe26954/portal/) for a shady mining company, but they’ve finally managed to remove your access. They change the code daily. Can you figure out a way to log in now?
Here are the codes (https://metaproblems.com/6f5fa97da7de2648c1531316ebe26954/leaked_tokens.txt) for the previous 30 days and the script they use to generate them (https://metaproblems.com/6f5fa97da7de2648c1531316ebe26954/token_generator.py)
Solution
Problem Analysis
First we check out the admin portal.
Since they mention we get a previous list of codes I decided not to look into the captcha. Next we download the previous 30 days of codes and the
code generation script. I took a peak at the codes to get a sense of format. We can see that each code is a set of 6 words seperated by a dash.
Now we can dive into the generation script. First the random
library is imported. The uses it to initialize a 64 byte random seed. The script
also defines a bit mask of 64 1’s. The state_1
variable is set to a random 64-bit value and state_2
gets set using the previous values
as defined in this line:
state_2 = (state_1 + random.getrandbits(64)) & mask
The next bit of code here:
with open("bip39.txt") as f:
bip = [i for i in f.read().split("\n") if i]
defines a list bip
that opens the file bip39.txt
and splits the entries by newline and saves each line as an entry in the list.
This is the first little roadblock, as we are not provided with this file. I decided to google bip39 word list
and this yielded
good results. I realized that the codes were meant to be like the bitcoin recovery phrases. I’m suprised I didn’t remember that
from the filename as I have done a few challenges using bip39 in the past. One of the gogle results says
The Bitcoin Improvement Proposal 39 wordlist (or 'BIP39' for short) is a standardized set of words for the recovery and backup of a bitcoin or cryptocurrency wallet.
So after poking around a few websites, I found the file on github here.
The generate_token()
function looked pretty meaty, so looked at the convert_to_string()
function next.
It takes in an integer parameter called token
. It takes the last 11 bits of the token and saves it to a list.
It continues this process taking 11 bits at a time a total of 6 times. Each of these 11 bit chunks is used as an index
for the bip39
wordlist and combines them with a dash creating the passphrase. Essentially, this function just takes a large number
and uses an algorithm to generate a passphrase from it. In other words, if we get the number, we get the passpharse.
Ok, so the script uses the generate_token()
function to get theses large numbers and it turens those into passphrases.
Time to look at the generate_token()
function.
def generate_token():
global seed, state_1, state_2, mask
result = seed & mask
seed >>= 64
inc = ((result * state_1 + state_2) ^ seed) & mask
seed |= inc << (7 * 64)
return result
This function takes the last 64 bits of the seed and saves it to result
. Those 64 bits are then removed from the seed. Then the
inc
variable is defined as a mathematical combination of result
, seed
, mask
, and the two state variables. Then seed
is
updated using inc
. Finally result
is returned from the function. Ok so what we have on a high level is two variables
holding a “state” which in this case is a 64-bit value, and a seed which gets updated using the states. There are a few important
things to note at this point. First off, the states are never updated. They are defined once a the beginning of the script but
stay static from that point forward. The next thing is that the “tokens” that are turned into passphrases are just the last
64 bits of the value stored in seed
which gets updated each time based on the state variables. Finally we need to note what
we know and what we don’t. We are given the first 30 passphrases that are generated. Since each passphrase is associated with
the result
gotten from a call to generate_token()
, we are given the first 30 “results”. Recall that result
is always the
last 64 bits of the seed. Essentially, seed starts completely random, but each of the first 30 times a passphrase is generated,
bits of the state_1
and state_2
variables are incorperated into the new seed value, and a part of that value gets leaked to
us in the form of a passphrase. This leaves us with a general gameplan. First, turn the leaked passphrases back into the number
they are associated with. Next, we need to use these leaks to figure out what state variables could’ve led to such leaks. Finally,
once we recover the state variables and initial seed, we can simply use the existing code to generate what the next passphrase would
be and use it to login to the admin panel.
Step 1
First at the top of my script I loaded the wordlist into a python list with the same method seen in the challenge script.
with open("bip39.txt") as f:
bip = [i for i in f.read().split("\n") if i]
Then I wrote a function gen_binary_token(passphrase)
which takes in a passphrase and converts it to a 64 bit bitstring:
def gen_binary_token(passphrase):
words = passphrase.split('-')
words = words[::-1] # Reverse order to concatenate binStrings correctly
bin_token = ''
for word in words:
val = bip.index(word)
bin_chunk = bin(val)[2:].rjust(11, '0')
bin_token += bin_chunk
bin_token.rjust(64, '0')
return bin_token
The first line creates a list of each individual word in the passphrase. The words are then reversed because the last 11 bits
of the token are used to created the first word of the passphrase. Then for each word we get its index value in the wordlist
using pythons .index()
function. Then that number is converted to binary and formatted to have leading zeros so that it is
11 bits long. Then that binary string appended to the bin_token
variable which will hold to full binary string. After this
is done for every word, the bin_token
variable is returned. I made two more helper functions for step one, gen_dataset()
and parse_leaked_tokens()
.
def gen_dataset(tokens):
data = []
for token in tokens:
data.append(gen_binary_token(token))
return data
This one takes a list of passphrases and returns a list of the binary tokens for those phrases.
def parse_leaked_tokens(file):
tokens = []
for line in file:
tokens.append(line.rstrip())
return tokens
This function just takes in a file and makes a list of passphrases
Finally there is the driver code for step 1:
leak = open('leaked_tokens.txt', 'r')
tokens = parse_leaked_tokens(leak)
leak.close()
data = gen_dataset(tokens)
Now we have our set of leaks and we’re ready for step 2!
Step 2
This is where we need to use z3. It would be an absolute nightmare to try and manually relate the leaked information about the state variables to the leaked bits of the seed. This is a perfect problem for the z3 constrain solver. If you are unfamiliar, z3 allows us to feed it a list of constraints on one or more variables that we don’t know. Then it uses some black magic boolean algebra stuff behind the scenes to tell us if there is a solution to what is essentially a HUGE system of equations. If there is a solution, it can tell what it is, or in the case there is more the one, tell us all possible solutions. This is a great use of z3 because nothing that happens is too computationally difficult, its mostly just bit manipulation and basic arithmetic. We import z3 into python with the line
from z3 import*
Using the library, I made a function recover_vars(data)
. This will take in our dataset made in step 1 and generate constraints
using the equaions from the generate_token()
function given to us. Then it will feed these constraints to z3 and tell it to find
our unknowns, which here are state_1
, state_2
, and seed
. This the complete function:
def recover_vars(data):
s = Solver()
int_data = [int(x, 2) for x in data]
state_1 = BitVec('state_1', 64)
state_2 = BitVec('state_2', 64)
seeds = [BitVec(f'seed_{i}', 64 * 8) for i in range(30)]
print("Generating constraints...")
s.add(state_2 == (state_1 + BitVec('rand_64', 64)) & mask)
for i, result in enumerate(int_data):
inc = BitVec(f'inc_{i}', 64*8)
s.add(result == seeds[i] & mask)
if i != len(int_data)-1:
s.add(inc == ZeroExt(64*7, result * state_1 + state_2) ^ LShR(seeds[i], 64))
s.add(seeds[i+1] == (LShR(seeds[i], 64) | inc << (7*64)))
print("Contraints generated!")
print("Solving...")
print(s.check())
print("Solving complete!")
model = s.model()
s1 = model[state_1].as_long()
s2 = model[state_2].as_long()
seed = model[seeds[0]].as_long()
return (seed, s1, s2)
First we create a z3 solver object s
. Next I convert all the bitstrings that came from the leaked passphrases, “10001011…” for
example, to their integer form in the int_data
variable. The z3 engine allows us to specify variables of different types. For
our problem, it fits to use the bit vector type BitVec()
because we will be doing alot of bit level operations like AND, OR,
and bit shifts. We define the state variables as z3 variables using BitVec('state_1', 64)
and BitVec('state_2', 64)
. This
creates two z3 bitvectors called ‘state_1’ and ‘state_2’ which are both 64 bits. Next I create a list of z3 variables seeds
.
Since the seed variable gets updated every time a new token is generated, we make a different variable for each iteration of
seed, each 64 bytes long with a name in the form of ‘seed_i’ where i is the iteration number. We can use the .add()
methon of
the z3 solver object to add a constraint to the solver. The first one we add is state_2 == (state_1 + BitVec('rand_64', 64)
.
We got this from when state_2
was defined in the original problem file. The equation tells z3 that state_2
must be equal to
state_1
plus some unknown 64 bit value. Then for each leaked token value, we define a variable inc_i
which is another 64 byte
value. We add the constraint that the lower 64 bits of the ith seed value is equal to our leak. We also add a constraint on
the inc_i
variable declaring that it must be created using the given combination of state variables and current seed. Then
we add the constraint that the next seed iteration will be equal to the current seed shifted left by 64 bits and bitwise AND
with inc_i
. This will result in the solver object s
having all the relationships between variables in mathematical form.
Then we call s.check()
which will attempt to solve for the variables and either spit out sat
for satifiable, meaning there
is at least one solution and z3 has solved the problem, or unsat
meaning there is not possible variable values that satifies
all the given constraints. The third possiblility is that the program hangs here indefinitely because there is simply too much
math and our computer isn’t powerfull enough. Luckily for us, this problem is simple enough that z3 solves it pretty quick.
The next bit of code uses s.model()
to actually pull the solutions from z3. It uses .as_long()
to interpret the z3 solved
variables as integers. Then the initial seed, state_1
, and state_2
are returned from the function as a tuple. The information
is stored in the main driver code like this:
seed, state_1, state_2 = recover_vars(data)
Step 3
Now that we have the seed and state variables that the website is using to create passphrases, we can predict what the 31’st value
generated will be. I made a function generate_nth_token(seed, state_1, state_2, n)
that will take in the initial values and return
the nth number that is generated using those values.
def generate_nth_token(seed, state_1, state_2, n):
for _ in range(n):
result = seed & mask
seed >>= 64
inc = ((result * state_1 + state_2) ^ seed) & mask
seed |= inc << (7 * 64)
return result
To make this, I basically copy and pasted the part of the code from the challenge script that generated the first 30 phrases, but
instead of generating 30 values, it generates n
values and returns the nth one from the function. This means we can get the next
token and use the convert_to_string()
function to get the passphrase the website generated like so:
next_token = generate_nth_token(seed, state_1, state_2, 31)
passphrase = convert_to_string(next_token)
print(f"Next passphrase: {passphrase}")
This completes of finished solve.py
script:
from z3 import *
mask = (1 << 64) - 1
with open("bip39.txt") as f:
bip = [i for i in f.read().split("\n") if i]
def gen_binary_token(passphrase):
words = passphrase.split('-')
words = words[::-1] # Reverse order to concatenate binStrings correctly
bin_token = ''
for word in words:
val = bip.index(word)
bin_chunk = bin(val)[2:].rjust(11, '0')
bin_token += bin_chunk
bin_token.rjust(64, '0')
return bin_token
def gen_dataset(tokens):
data = []
for token in tokens:
data.append(gen_binary_token(token))
return data
def parse_leaked_tokens(file):
tokens = []
for line in file:
tokens.append(line.rstrip())
return tokens
def recover_vars(data):
s = Solver()
int_data = [int(x, 2) for x in data]
state_1 = BitVec('state_1', 64)
state_2 = BitVec('state_2', 64)
seeds = [BitVec(f'seed_{i}', 64 * 8) for i in range(30)]
print("Generating constraints...")
s.add(state_2 == (state_1 + BitVec('rand_64', 64)) & mask)
for i, result in enumerate(int_data):
inc = BitVec(f'inc_{i}', 64*8)
s.add(result == seeds[i] & mask)
if i != len(int_data)-1:
s.add(inc == ZeroExt(64*7, result * state_1 + state_2) ^ LShR(seeds[i], 64))
s.add(seeds[i+1] == (LShR(seeds[i], 64) | inc << (7*64)))
print("Contraints generated!")
print("Solving...")
print(s.check())
print("Solving complete!")
model = s.model()
s1 = model[state_1].as_long()
s2 = model[state_2].as_long()
seed = model[seeds[0]].as_long()
return (seed, s1, s2)
def convert_to_string(token):
r = token
n = []
for i in range(6):
n.append(token & 0x7FF)
token >>= 11
return "-".join([bip[i] for i in n])
def generate_nth_token(seed, state_1, state_2, n):
for _ in range(n):
result = seed & mask
seed >>= 64
inc = ((result * state_1 + state_2) ^ seed) & mask
seed |= inc << (7 * 64)
return result
def main():
leak = open('leaked_tokens.txt', 'r')
tokens = parse_leaked_tokens(leak)
leak.close()
data = gen_dataset(tokens)
seed, state_1, state_2 = recover_vars(data)
print(f"State_1: {state_1}")
print(f"State_2: {state_2}")
print(f"Seed: {seed}")
next_token = generate_nth_token(seed, state_1, state_2, 31)
passphrase = convert_to_string(next_token)
print(f"Next passphrase: {passphrase}")
if __name__ == '__main__':
main()
Step 4
Now when we run the program this is the output:
Generating constraints...
Contraints generated!
Solving...
sat
Solving complete!
State_1: 336801044331229251
State_2: 10569036738506978277
Seed: 8839961978802441455258413878340373062039297925414226365427229710156238321689408417320578245874950247135846156941309250835199373484350046890129651526955751
Next passphrase: tool-unveil-ranch-soldier-coast-cover
As we can see the next passphrase will be tool-unveil-ranch-soldier-coast-cover
and sure enough when we enter it as a passphrase
and solve the captha the website spits out the flag.
Flag
flag{shouldnt_have_used_my_own_number_generator}