Reversing Crackme Challenges

You may have noticed that I like to program many things in Ruby. I really do like many aspects of this language, and it’s usually the main language that I use at work as well. Lately I’ve been getting the feeling that a programmer can get judged and immediately pigeon holed based on using a language a lot at work. For example if you use Ruby you must be “only” a web programmer, if you use a garbage collected language, you mustn’t be able to manage memory, or handle programming in languages like C or C++, let alone assembly.

So I like to take some of my free time and play hacker wargames like smashthestack.org, or reverse engineer crackmes, so I decided to describe my process for solving those types of challanges here.

Crackmes

“Crackmes” are essentially a compiled binary without source code, which asks you for a password or key, you enter the wrong password you lose, you enter the right one and you win and move onto the next. You need to reverse engineer the binary in order to discover the password, and these range from simple to tedious and difficult as the creator of the crackme piles on layers of obfuscation, misdirection, encryption, and anti-debugging techniques to stop you, not to mention you need a good handle on assembly and keeping track of what is in various memory locations during runtime.

Crackme0x00a

We start out simple, and so I start with my simplest techniques. crackme0x00a wants a password:

  ./crackme0x00a
  IOLI Crackme Level 0x09
  Password: blah
  Password Incorrect!

So first we think that perhaps the password is simply in the binary, and we can find it by running the program strings on it to list every string in the program. I think it’s safe to say that will work on this one, but I have a more direct approach that I usually take, and that is run the program through ltrace. ltrace outputs all calls to library functions that the program makes, so here we’re looking for something like a call to strcmp

  ltrace ./crackme0x00a
  __libc_start_main(0x80484e4, 1, 0xffa24724, 0x8048570, 0x80485e0 <unfinished ...>
  printf("Enter password: ") = 16
  __isoc99_scanf(0x8048651, 0xffa24663, 0x8049ff4, 0x8048591, -1Enter password: blah
  ) = 1
  strcmp("g00dJ0B!", "blah") = 1
  puts("Wrong!"Wrong!) = 7

Simple, we found the call to strcmp the password is g00dJ0B!. You see how we can see each call, with parameters and return value. This was all that was needed to beat this challenge.

Crackme0x00b

Let’s move onto the next one, with the same approach

ltrace ./crackme0x00b
__libc_start_main(0x8048494, 1, 0xff82ec54, 0x8048500, 0x8048570 <unfinished ...>
printf("Enter password: ") = 16
__isoc99_scanf(0x80485e1, 0xff82eb4c, 0, 0xf77f249c, 0xff82ebf4Enter password: blah
) = 1
wcscmp(0x804a040, 0xff82eb4c) = 1
puts("Wrong!"Wrong!) = 7

So the difference here is the password is being compared with wcscmp, which has the prototype int wcscmp (const wchar_t* wcs1, const wchar_t* wcs2); It compares wide strings where each char is 16-bit.

On Linux most of the time the code and data sections get loaded around the address 0x08040000, and local variables are stored in the stack, which begins at the top of memory and grows upwards from 0xffffffff.

The arguments to wcscmp are 0x804a040, which I’m guessing is in the initialized data section, and is the secret password, and 0xff82eb4c which ltrace shows is the location on the stack that it wrote our input to.

The trick to this challenge, is that if we were simply to run the strings program on the binary, it would have worked in the previous challenge, but not this one because each character in a wchar_t string is 32-bits long with the most significant bits zeroed out, and terminated by a a 32-bit 0x0000 value.

The quickest way I know to find out the bytes at 0x804a040 is to load the binary into my debugger radare2 and just print it out.

[0x080483e0]> px 64 @0x804a040
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0x0804a040  7700 0000 3000 0000 7700 0000 6700 0000  w...0...w...g...
0x0804a050  7200 0000 6500 0000 6100 0000 7400 0000  r...e...a...t...
0x0804a060  0000 0000 4743 433a 2028 5562 756e 7475  ....GCC: (Ubuntu
0x0804a070  2f4c 696e 6172 6f20 342e 362e 312d 3975  /Linaro 4.6.1-9u

The password is w0wgreat.

Crackme0x01

Ok, let’s see what’s new in the next one, and hopefully get to use radare2 some more. First let’s see what happens with ltrace.

ltrace ./crackme0x01
__libc_start_main(0x80483e4, 1, 0xffdd85c4, 0x8048460, 0x80484d0 <unfinished ...>
printf("IOLI Crackme Level 0x01\n"IOLI Crackme Level 0x01) = 24
printf("Password: ") = 10
scanf(0x804854c, 0xffdd8524, 0xf77a6ff4, 0xf7636dd5, 0xf77c8660

Password: blah
) = 0
printf("Invalid Password!\n"Invalid Password!) = 18

So usually I just write in blah for a password, but we can see here that scanf has returned 0 in response to my string. scanf returns the number of things it has parsed according to its format string. So I’m guessing its format string was not "%s" this time as it was before. Maybe it was looking for a number. Let’s load it into r2, dissassemble the main function and find out.

╒ (fcn) sym.main 113
│           ; arg int arg_149ah    @ ebp+0x149a
│           ; var int local_4h     @ ebp-0x4
│           0x080483e4      55             push ebp
│           0x080483e5      89e5           mov ebp, esp
│           0x080483e7      83ec18         sub esp, 0x18
│           0x080483ea      83e4f0         and esp, 0xfffffff0
│           0x080483ed      b800000000     mov eax, 0
│           0x080483f2      83c00f         add eax, 0xf
│           0x080483f5      83c00f         add eax, 0xf
│           0x080483f8      c1e804         shr eax, 4
│           0x080483fb      c1e004         shl eax, 4
│           0x080483fe      29c4           sub esp, eax
│           0x08048400      c70424288504.  mov dword [esp], str.IOLI_Crackme_Level_0x01_n
│           0x08048407      e810ffffff     call sym.imp.printf
│           0x0804840c      c70424418504.  mov dword [esp], str.Password:
│           0x08048413      e804ffffff     call sym.imp.printf
│           0x08048418      8d45fc         lea eax, [ebp - local_4h]
│           0x0804841b      89442404       mov dword [esp + 4], eax
│           0x0804841f      c704244c8504.  mov dword [esp], 0x804854c  ; "%d"
│           0x08048426      e8e1feffff     call sym.imp.scanf
│           0x0804842b      817dfc9a1400.  cmp dword [ebp - local_4h], 0x149a
│       ┌─< 0x08048432      740e           je 0x8048442
│       │   0x08048434      c704244f8504.  mov dword [esp], str.Invalid_Password__n
│       │   0x0804843b      e8dcfeffff     call sym.imp.printf
│      ┌──< 0x08048440      eb0c           jmp 0x804844e
│      │└─> 0x08048442      c70424628504.  mov dword [esp], str.Password_OK_:__n
│      │    0x08048449      e8cefeffff     call sym.imp.printf
│      └──> 0x0804844e      b800000000     mov eax, 0
│           0x08048453      c9             leave
╘           0x08048454      c3             ret

So, woo, radare is the best. It has added helpful identifiers for us rather then make us look up a bunch of addresses. For example str.Invalid_Password__n is an identifier standing in for the address of the string `“Invalid Password\n”.

The assembly listing is easy to follow thanks to the analysis radare2 has done, and the symbols it has added for us, it even graphically shows branches and loops.

The first thing to know when you look at the assembly listing of a function created by a compiler, is that there is going to be 3 main sections: the function prologue, the body, and the function epilogue. If you already know this, you will probably want to skip down a little ways.

The Function Prologue

Like I mentioned before, the stack is an area of memory which in Linux begins at 0xffffffff, and grows upwards in memory, and the CPU register esp points to the top of the stack. To get this analogy you have to think of memory as a something like a container that holds plates. The bottom of the container is 0xffffff, the top of it is 0x000000. When you place plates onto the stack over and over, the pile grows upwards. That is pushing things onto the stack.

To remove things from the stack, you “pop” them off of it. You can’t pop a plate out of the middle or the bottom, you can only pop a plate off the top. It is a LIFO stack, last in is first out.

An x86 CPU has push and pop instructions for this, and guess what, compilers don’t use them very much, prefering to instead just do arithmetic on esp or ebp in order to get values from the stack or manipulate its size.

In a running program, we use the stack as a “function call stack”, which is a sequence of “stack frames”, one per function called.

So let’s explain how that’s laid out so we can get back to the crackme. Take this example nested function calls:

int h(c){
  int d = 10;
  return c + d;
}
int g(b){
  return h(b + 1);
}
int f(a){
  return g(a + 1);
}
f(0);

|      10          | Local variable d  (esp) (ebp - 4)
|   0xffffffef     | Saved ebp frame pointer
| h Return Address |
|       2          | Argument to h
|   0xfffffff7     | Saved ebp frame pointer
| g Return Address |
|       1          | Argument to g
|   0xffffffff     | Saved ebp frame pointer
| f Return Address |
|       0          | Argument to f

The calling convention used here is called cdecl, to call a function we put its arguments on the stack in reverse order, and use the call instruction. call automatically pushes the address of the next instruction after it onto the stack, so we will know where to return to after the function call ends.

Once inside the function, space is made to hold the local variables, usually by subtracting the local variables size in bytes from esp, raising the stack up higher, sometimes this size is aligned on a 16 byte boundry by using the and instruction. This is why there is junk data in uninitialized local variables, they just hold whatever garbage happened to be on the stack.

main() like any other function has a prologue.

push ebp                 ; Push the previous base pointer to the stack
mov ebp, esp             ; Make our new base pointer the top of the stack.
sub esp, 0x18            ; Increase the stack space by 0x18 bytes
and esp, 0xfffffff0      ; Align the stack to 16 byte boundry.
mov eax, 0               ; eax = 0
add eax, 0xf             ; eax = 0xf
add eax, 0xf             ; eax = 0x1e
shr eax, 4               ; eax = 0x1
shl eax, 4               ; eax = 0x10 A shift right then left also aligns
sub esp, eax             ; Increase stack space by another 0x10 bytes.

So we save the old base pointer (base pointer, frame pointer, same thing) then start subtracting from esp to increase the space for local variables.

It makes space twice, using two different methods for aligning it to a 16-byte boundry. You could ask, why doesn’t it just do this all in one subraction. I have no idea why it does it this way, but I do know that when you compile with optimizations off, the compiler often does redundant things. You might also ask why it reserves so much stack space for the one 32-bit local variable that actually exists in this main function.

Your guess is as good as mine. In the assembly listing we can see that the only local variable used is called local_4h, which is the value at ebp - 0x4, the int sized local variable that scanf parses our input into.

The Function Epilogue

mov eax, 0
leave
ret

The leave instruction is the same as mov esp, ebp pop ebp. This effectively undoes all the stack resizing the prologue did, and then restores the base pointer to what it was before main was called. Yep, there is another hidden function that calls main()

There is a register called eip that holds the address of the current instruction, but you can’t directly assign a value to it except to use a branching instruction or ret. After leave is executed the next thing on the stack is the return address of the caller, ret pops that address into the eip register and execution continues from there.

A function’s return value is always stored in the eax register, so mov eax, 0 is the same as return 0 at the end of main().

Back to scanf

lea eax, [ebp - local_4h]
mov dword [esp + 4], eax
mov dword [esp], 0x804854c  ; "%d"
call sym.imp.scanf
cmp dword [ebp - local_4h], 0x149a
je 0x8048442

The first instruction, lea stands for load effective address, and is good at doing pointer arithmetic, calculating the address of elements inside arrays, and things like that. [ebp - 4] is the address of the local int sized value on the stack we want scanf to write to.

local_4h is just a symbol radare uses to remind us what is in a memory location, and it allows us to rename it if we want, it just equals 4 in this case.

0x804854c is the address of the constant string “%d” located in the data section. So this code is just moving the value 0x804854c to the top of the stack, and the address of our local variable 4 bytes after that in reverse order that scanf takes them, then we call to scanf.

At this point whatever we wrote, if it was a number, will be located at [ebp - 4], and this value is compared with 0x149a, if they are equal we jump to the “you win” screen.

So that solves this crackme, all we need to type for a password is, in decimal 5274.

./crackme0x01
IOLI Crackme Level 0x01
Password: 5274
Password OK :)

Crackme0x02

Ok, let’s hope they get a bit harder from here on, and this should go a lot faster without needing to explain the stack, etc.

A quick run shows that ltrace is not going to help us this time, so back into radare we go. I’ve renamed some of the local variable so it will be clearer what is happening after the scanf.

call sym.imp.scanf
mov dword [ebp - a], 0x5a
mov dword [ebp - b], 0x1ec
mov edx, dword [ebp - b]
lea eax, [ebp - a]
add dword [eax], edx
mov eax, dword [ebp - a]
imul eax, dword [ebp - a]
mov dword [ebp - b], eax
mov eax, dword [ebp - number_entered]
cmp eax, dword [ebp - b]
jne 0x8048461

So we have 3 local variables, number_entered, a, and b. Let’s translate this to C psuedocode and check what the condition for winning is.

scanf("%d", &number_entered);
a = 0x5a;
b = 0x1ec;
edx = b;
eax = &a;
*a = *eax + edx
eax = a;
eax *= a;
b = eax;
eax = number_entered;
if(eax == b) goto win

After doing that arithmetic and register shuffling, it turns out b = 338724, which is the password.

Crackme0x03

In crackme0x03 we have all the same stuff as in the previous, but now we’re calling a function test(number_entered, b) which will test the password.

mov eax, dword [ebp - b]
mov dword [esp + 4], eax   ; pass b
mov eax, dword [ebp - number_entered]
mov dword [esp], eax       ; pass number_entered
call sym.test

So let’s have a look at this test function.

╒ (fcn) sym.test 42
│           ; arg int number_entered @ ebp+0x8
│           ; arg int b            @ ebp+0xc
│           0x0804846e      55             push ebp
│           0x0804846f      89e5           mov ebp, esp
│           0x08048471      83ec08         sub esp, 8
│           0x08048474      8b4508         mov eax, dword [ebp+number_entered]
│           0x08048477      3b450c         cmp eax, dword [ebp+b]
│       ┌─< 0x0804847a      740e           je 0x804848a
│       │   0x0804847c      c70424ec8504.  mov dword [esp], str.Lqydolg_Sdvvzrug_
│       │   0x08048483      e88cffffff     call sym.shift
│      ┌──< 0x08048488      eb0c           jmp 0x8048496
│      │└─> 0x0804848a      c70424fe8504.  mov dword [esp], str.Sdvvzrug_RN______
│      │    0x08048491      e87effffff     call sym.shift
│      └──> 0x08048496      c9             leave
╘           0x08048497      c3             ret

Ok so what’s going on in here. First thing I see is 8 bytes allocated on the stack and then never used (for some reason), and I see our two arguments coming in at epb + 0x8 and ebp + 0xc which I’ve renamed to match the calling code.

We then compare number_entered and b for equality, which chooses one or another string containing giberish to be passed to the function shift(). In either case the return value of test() is not checked, and we return from the test() function.

Let’s check the actual contents of these gibberish strings out in radare with the ps command to print strings.

ps @ str.Lqydolg_Sdvvzrug_
Lqydolg#Sdvvzrug$
ps @ str.Sdvvzrug_RN______
Sdvvzrug#RN$$$#=,

Let’s remember that, and have a look at the shift() function which I’m going to guess is going to decrypt these strings somehow.

│           ; arg int string       @ ebp+0x8
│           ; arg int arg_13h      @ ebp+0x13
│           ; var int decrypted    @ ebp-0x78
│           ; var int counter      @ ebp-0x7c
│           ; CALL XREF from 0x08048491 (sym.shift)
│           ; CALL XREF from 0x08048483 (sym.shift)
│           0x08048414      55             push ebp
│           0x08048415      89e5           mov ebp, esp
│           0x08048417      81ec98000000   sub esp, 0x98
│           0x0804841d      c74584000000.  mov dword [ebp - counter], 0
│       ┌─> 0x08048424      8b4508         mov eax, dword [ebp+string]
│       │   0x08048427      890424         mov dword [esp], eax
│       │   0x0804842a      e811ffffff     call sym.imp.strlen
│       │   0x0804842f      394584         cmp dword [ebp - counter], eax
│      ┌──< 0x08048432      731c           jae 0x8048450
│      ││   0x08048434      8d4588         lea eax, [ebp - decrypted]
│      ││   0x08048437      89c2           mov edx, eax
│      ││   0x08048439      035584         add edx, dword [ebp - counter]
│      ││   0x0804843c      8b4584         mov eax, dword [ebp - counter]
│      ││   0x0804843f      034508         add eax, dword [ebp+string]
│      ││   0x08048442      0fb600         movzx eax, byte [eax]
│      ││   0x08048445      2c03           sub al, 3
│      ││   0x08048447      8802           mov byte [edx], al
│      ││   0x08048449      8d4584         lea eax, [ebp - counter]
│      ││   0x0804844c      ff00           inc dword [eax]
│      │└─< 0x0804844e      ebd4           jmp 0x8048424
│      └──> 0x08048450      8d4588         lea eax, [ebp - decrypted]
│           0x08048453      034584         add eax, dword [ebp - counter]
│           0x08048456      c60000         mov byte [eax], 0
│           0x08048459      8d4588         lea eax, [ebp - decrypted]
│           0x0804845c      89442404       mov dword [esp + 4], eax
│           0x08048460      c70424e88504.  mov dword [esp], 0x80485e8  ; "%s."
│           0x08048467      e8e4feffff     call sym.imp.printf
│           0x0804846c      c9             leave
╘           0x0804846d      c3             ret

I’ve renamed the local stack variables to be what I believe their purpose is. This is a bigger function than we’ve seen so far, this loop pattern we see here is what a for loop looks like in assembly. You can see how counter is initialized to 0, then a condition is checked which either processes the body and increments counter, or it jumps out of the loop.

I guess is going to iterate over the string that was passed in, and probably “decrypt” it. Maybe it will help if we translate this back into C.

void shift(char *string){
  char decrypted[120];
  int counter;
  for(i = 0; i < strlen(string); ++i){
    decrypted[i] = string[i] - 3;
  }
  decrypted[counter] = '\0';
  printf("%s.", decrypted);
}

All it did was subtract 3 from the ascii code of the gibberish string, which is why we say it is “decrypting” in quotes :) Remember we still have not run this binary yet, so let’s just write up a quick Ruby script to see what the giberish decodes to.

"Lqydolg#Sdvvzrug$".split('').map do |c|
  (c.ord - 3).chr
end.join
 => "Invalid Password!"
"Sdvvzrug#RN$$$#=,".split('').map do |c|
  (c.ord - 3).chr}.join
end
=> "Password OK!!! :)"

Ok so we’ve learned that if the password you enter is equal to the variable calculated to b, we will select and decode the “Password Ok” string, and win, that was pretty obvious from before even looking at this function, but we got to make sure we got to the bottom of each function. So if I remember right, b was 338724, the same password as last time, real sneaky guys.

./crackme0x03
IOLI Crackme Level 0x02
Password: 338724
Password OK :)