This is a CTF problem from UMBC’s 2018 Spring CTF. There were only two solves on this challenge including our own team, despite it not being too difficult, so I figured this would be a good problem to do a writeup for.
We are given two files:
One is a binary, and the other is a libc file. So, we copy in our exploit initialization script, mark the binary executable with
chmod +x ROPU, and run it.
Enter the payload!
The program seems to be very simple. It reads some input into a buffer and calls
printf on that buffer. We can verify this by opening the binary in IDA and viewing the
Great! We can see that the input function is
gets, which we know is insecure.
gets doesn’t stop reading input until a
\n character is entered, which it replaces with a null-byte. The buffer is 0x20 bytes large according to IDA, so we can easily overflow the input buffer and control the instruction pointer.
To find the exact offset of the return address on the stack, I use the following lines in my script.
p.sendline(cyclic(100, n=8 if x64 else 4))
This generates a De Bruijn sequence of characters that we can use to identify offsets and sends it to the stdin of the program. After running it, we get the following output.
After the program read the input, it continued and eventually segfaulted. Note that the binary didn’t actually run the
ret instruction. In 64-bit binaries, the binary actually segfaults before it runs the segfaulting instruction. In this case, we segfault because we are trying to pop
faaaaaaa into the $rip. This string is part of the De Bruijn sequence that we inputted. Using pwntools, we can figure out the offset in the sequence where this string occurs, and create a string exactly that large.
buf = cyclic_find('faaaaaaa', n=8) * 'a'
Now that we can control the instruction pointer, we need to figure out how we can run an arbitrary sequence of instructions to open a shell.
checksec on the binary, we notice that NX is enabled and there is no stack canary.
$ checksec ROPU
Having NX enabled means we can’t just write shellcode and jump to it on the stack. Therefore, we’ll have to construct a ROP chain.
Unfortunately, the binary is so small that we’d have to come up with a clever ROP chain to use the gadgets within the binary to give us a shell. Since we are lazy, and since the binary is dynamically linked, let’s instead see if we can construct a ROP chain using the numerous gadgets in the libc file.
In order to construct a ROP chain using those gadgets, we need to leak the address of a function in the libc file from the GOT. This is because on a system with ASLR enabled, we can’t just jump to a hard-coded address in the shared object file. For a detailed explanation, see Dynamic Linking with Shared Objects.
Let’s say we found out that
printf is located at address
0x08048bca. Let’s also assume that in the libc file, there is a shell function 0x30 bytes away from the beginning of the
printf function. Then, we know that the shell function is located at
0x08048bca + 0x30 in the randomized memory too. This is because the whole address isn’t randomized - only the base
0x08048 is. The offsets between functions in the shared object will remain constant between the file and when it’s loaded into memory at runtime.
In order to leak an address from the GOT, let’s call the
puts function with argument the address we want to leak. Since we are calling
puts, we can be sure that GOT[‘puts’] will have an entry, so we’ll just leak that.
buf = cyclic_find('faaaaaaa', n=8) * 'a' # 40 bytes
There is a small mistake above though. In a 64-bit binary, arguments are passed into registers before the stack (see here). Therefore, we will need to first pop the argument (GOT[‘puts’]) into $rdi, and only then can we call
I used ROPgadget to search for gadgets within the binary that pop into $rdi
ROPgadget --binary ROPU | less
In the output, I found the following:
So, by ropping to address 0x0000000000400783, we can pop the argument for
puts into $rdi, and then call
pop_rdi = 0x0000000000400783 # pop rdi ; ret
Now, when the program runs, it will print “Enter the payload!”, read in our ROP chain, call
printf to echo back our input like usual, and finally call
puts to print the value in GOT[‘puts’] to stdout. We can capture that output and use it to calculate the base address of libc as follows.
# Receive the GOT address of puts
In line two, I received 6 bytes from the process - which is the number of bytes of any address in a 64-bit binary. You might think it should be 8 bytes, but even 64-bit CPUs only use 48-bit addresses today. Before I can unpack the 6 bytes into an integer, I have to pad it with bytes until it is 64-bits long for the
u64 function to decode it.
Now that we have the libc base, we can jump to any gadget we want to in the libc file! But wait - we already sent our payload! Where do we send the new addresses in the libc file that we want to jump to? Well, we can send another payload by just running the binary again from
main. The binary hasn’t quit yet, so the offsets and base addresses will all remain the same.
Using a tool called one_gadget, we can search for a single gadget that gives us a shell in the libc.
$ one_gadget "libc-2.23.so"
Note that while
one_gadget found several for us, it does impose constraints on each gadget that must be met before the gadget is called. I will use the first gadget since it is the simplest - but before I can use it, I need to ensure that $rax = 0. I do this by searching for gadgets that pop values into $rax using ROPgadget as I did before. I couldn’t find a simple one in the binary itself, but when I ran it on the libc file, I found a simple
pop $rax; ret; so I decided to use that. Immediately after, we can use the
one_gadget gadget to give us a shell.
shell = 0x45216 # Condition: rax = NULL
Line 9 is unnecessary for the exploit to work, but it does let the program exit gracefully once the shell completes.
Here is our final exploit code. After running it, we get the following.
Awesome, we got a shell!
To understand why we need to leak an address, we need to understand how code from shared objects is run. When ASLR is enabled on an operating system, every time a program runs, the address of each function it uses from a shared object is randomized.
Any program that wants to use code from a shared object needs to know where the code is located so it can be jumped to. However, if that location changes every time the program is run, then the program can’t hard code a location to jump to (i.e. it can’t say “oh you want to call
printf? That code is always located at address 0x08048cba, just jump there”). So there needs to be a way to get the address at runtime. The image below explains how this is accomplished.
The dynamic linker fills in a data structure with the addresses of each function in the shared object. This data structure is known as the Global Offset Table, and might look like the following.
||Random virtual address of
||Random virtual address of
All the program needs to do is read the appropriate entry in the GOT and jump to it. But how does it know which entry in the GOT to read? It does this using something called the PLT.
- When a binary calls a function from a shared object, it jumps to some code in the PLT.
- The PLT is a table full of chunks of instructions appended together. Each entry in the PLT does the following:
- Read an address from the GOT.
- Jump to that address.
So the PLT looks something like this.
||Code to lookup the
||Code to lookup the
Awesome, we now know how programs jump to random function addresses! They just use the code in the PLT, which reads the random address from the GOT and jumps to it.
How does the GOT get its values filled in? The dynamic linker takes care of that for us, so we don’t care too much. However, it IS important to realize that the values in the GOT are initially empty. The GOT is only filled with the appropriate addresses after the first time that the function has been called. Before that, each entry in the GOT contains the address of something called a resolver.
If it’s the first time that a function has been called, the PLT code reads the address of the resolver from the GOT and jumps to it. The resolver fills in the GOT entry and then jumps to the real function address. After this initialization, we never need to call the resolver for this function again since the GOT entry has been filled. So really, our diagram should look like this.