You may have noticed that I like to program many things in Ruby. I really do like many aspects of this language, and it’s usually the main language that I use at work as well. Lately I’ve been getting the feeling that a programmer can get judged and immediately pigeon holed based on using a language a lot at work. For example if you use Ruby you must be “only” a web programmer, if you use a garbage collected language, you mustn’t be able to manage memory, or handle programming in languages like C or C++, let alone assembly.
So I like to take some of my free time and play hacker wargames like smashthestack.org, or reverse engineer crackmes, so I decided to describe my process for solving those types of challanges here.
“Crackmes” are essentially a compiled binary without source code, which asks you for a password or key, you enter the wrong password you lose, you enter the right one and you win and move onto the next. You need to reverse engineer the binary in order to discover the password, and these range from simple to tedious and difficult as the creator of the crackme piles on layers of obfuscation, misdirection, encryption, and anti-debugging techniques to stop you, not to mention you need a good handle on assembly and keeping track of what is in various memory locations during runtime.
We start out simple, and so I start with my simplest techniques.
wants a password:
1 2 3 4
So first we think that perhaps the password is simply in the binary, and we can
find it by running the program
strings on it to list every string in the program.
I think it’s safe to say that will work on this one, but I have a more direct
approach that I usually take, and that is run the program through
outputs all calls to library functions that the program makes, so here we’re looking
for something like a call to
1 2 3 4 5 6 7
Simple, we found the call to
strcmp the password is
g00dJ0B!. You see how we can
see each call, with parameters and return value. This was all that was needed to
beat this challenge.
Let’s move onto the next one, with the same approach
1 2 3 4 5 6 7
So the difference here is the password is being compared with wcscmp, which has
int wcscmp (const wchar_t* wcs1, const wchar_t* wcs2); It compares
wide strings where each char is 16-bit.
On Linux most of the time the code and data sections get loaded around the address
0x08040000, and local variables are stored in the stack, which begins at the
top of memory and grows upwards from
The arguments to
0x804a040, which I’m guessing is in the initialized
data section, and is the secret password, and
ltrace shows is
the location on the stack that it wrote our input to.
The trick to this challenge, is that if we were simply to run the
on the binary, it would have worked in the previous challenge, but not this one
because each character in a
wchar_t string is 32-bits long with the most significant
bits zeroed out, and terminated by a a 32-bit 0x0000 value.
The quickest way I know to find out the bytes at
0x804a040 is to load the
binary into my debugger
radare2 and just print it out.
1 2 3 4 5 6
The password is
Ok, let’s see what’s new in the next one, and hopefully get to use
some more. First let’s see what happens with
1 2 3 4 5 6 7 8 9
So usually I just write in
blah for a password, but we can see here that
0 in response to my string.
scanf returns the number of things
it has parsed according to its format string. So I’m guessing its format string
"%s" this time as it was before. Maybe it was looking for a number.
Let’s load it into r2, dissassemble the main function and find out.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
So, woo, radare is the best. It has added helpful identifiers for us rather
then make us look up a bunch of addresses. For example
str.Invalid_Password__n is an
identifier standing in for the address of the string `“Invalid Password\n”.
The assembly listing is easy to follow thanks to the analysis
radare2 has done,
and the symbols it has added for us, it even graphically shows branches and loops.
The first thing to know when you look at the assembly listing of a function created by a compiler, is that there is going to be 3 main sections: the function prologue, the body, and the function epilogue. If you already know this, you will probably want to skip down a little ways.
The Function Prologue
Like I mentioned before, the stack is an area of memory which in Linux begins at
0xffffffff, and grows upwards in memory, and the CPU register
to the top of the stack. To get this analogy you have to think of memory as
a something like a container that holds plates. The bottom of the container is
0xffffff, the top of it is
0x000000. When you place plates onto the stack
over and over, the pile grows upwards. That is pushing things onto the stack.
To remove things from the stack, you “pop” them off of it. You can’t pop a plate out of the middle or the bottom, you can only pop a plate off the top. It is a LIFO stack, last in is first out.
An x86 CPU has
pop instructions for this, and guess what, compilers
don’t use them very much, prefering to instead just do arithmetic
ebp in order to get values from the stack or manipulate its
In a running program, we use the stack as a “function call stack”, which is a sequence of “stack frames”, one per function called.
So let’s explain how that’s laid out so we can get back to the crackme. Take this example nested function calls:
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
The calling convention used here is called
cdecl, to call a function we
put its arguments on the stack in reverse order, and use the
call automatically pushes the address of the next instruction after it
onto the stack, so we will know where to return to after the function call ends.
Once inside the function, space is made to hold the local variables, usually
by subtracting the local variables size in bytes from esp, raising the stack
up higher, sometimes this size is aligned on a 16 byte boundry by using the
and instruction. This is why there is junk data in uninitialized local variables,
they just hold whatever garbage happened to be on the stack.
main() like any other function has a prologue.
1 2 3 4 5 6 7 8 9 10
So we save the old base pointer (base pointer, frame pointer, same thing) then start subtracting from esp to increase the space for local variables.
It makes space twice, using two different methods for aligning it to a 16-byte boundry. You could ask, why doesn’t it just do this all in one subraction. I have no idea why it does it this way, but I do know that when you compile with optimizations off, the compiler often does redundant things. You might also ask why it reserves so much stack space for the one 32-bit local variable that actually exists in this main function.
Your guess is as good as mine. In the assembly listing we can see that
the only local variable used is called
local_4h, which is the value at
ebp - 0x4, the
int sized local variable that scanf parses our input
The Function Epilogue
1 2 3
leave instruction is the same as
mov esp, ebp
pop ebp. This
effectively undoes all the stack resizing the prologue did, and then
restores the base pointer to what it was before
main was called. Yep,
there is another hidden function that calls
There is a register called
eip that holds the address of the current
instruction, but you can’t directly assign a value to it except to use
a branching instruction or
leave is executed the next
thing on the stack is the return address of the caller,
ret pops that
address into the
eip register and execution continues from there.
A function’s return value is always stored in the
eax register, so
mov eax, 0 is the same as
return 0 at the end of
Back to scanf
1 2 3 4 5 6
The first instruction,
lea stands for load effective address, and is
good at doing pointer arithmetic, calculating the address of elements
inside arrays, and things like that.
[ebp - 4] is the address of the
int sized value on the stack we want
scanf to write to.
local_4h is just a symbol radare uses to remind us what is in a memory
location, and it allows us to rename it if we want, it just equals
0x804854c is the address of the constant string “%d” located in the
data section. So this code is just moving the value
the top of the stack, and the address of our local variable 4 bytes after that
in reverse order that
scanf takes them, then we
At this point whatever we wrote, if it was a number, will be located at
[ebp - 4], and this value is compared with
0x149a, if they are equal
we jump to the “you win” screen.
So that solves this crackme, all we need to type for a password is, in
1 2 3 4
Ok, let’s hope they get a bit harder from here on, and this should go a lot faster without needing to explain the stack, etc.
A quick run shows that
ltrace is not going to help us this time, so
back into radare we go. I’ve renamed some of the local variable so it will
be clearer what is happening after the
1 2 3 4 5 6 7 8 9 10 11 12
So we have 3 local variables,
b. Let’s translate
this to C psuedocode and check what the condition for winning is.
1 2 3 4 5 6 7 8 9 10 11
After doing that arithmetic and register shuffling, it turns out
b = 338724, which
is the password.
crackme0x03 we have all the same stuff as in the previous, but now we’re calling
test(number_entered, b) which will test the password.
1 2 3 4 5
So let’s have a look at this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Ok so what’s going on in here. First thing I see is 8 bytes allocated on the stack and
then never used (for some reason), and I see our two arguments coming in at
epb + 0x8
ebp + 0xc which I’ve renamed to match the calling code.
We then compare
b for equality, which chooses one or another
string containing giberish to be passed to the function
shift(). In either case
the return value of
test() is not checked, and we return from the
Let’s check the actual contents of these gibberish strings out in radare with the
command to print strings.
1 2 3 4
Let’s remember that, and have a look at the
shift() function which I’m going to
guess is going to decrypt these strings somehow.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
I’ve renamed the local stack variables to be what I believe their purpose is.
This is a bigger function than we’ve seen so far, this loop pattern we see
here is what a
for loop looks like in assembly. You can see how
is initialized to
0, then a condition is checked which either processes
the body and increments
counter, or it jumps out of the loop.
I guess is going to iterate over the string that was passed in, and probably “decrypt” it. Maybe it will help if we translate this back into C.
1 2 3 4 5 6 7 8 9
All it did was subtract
3 from the ascii code of the gibberish string, which
is why we say it is “decrypting” in quotes :) Remember we still have not run
this binary yet, so let’s just write up a quick Ruby script to see what the
giberish decodes to.
1 2 3 4 5 6 7 8
Ok so we’ve learned that if the password you enter is equal to the variable
b, we will select and decode the “Password Ok” string, and win,
that was pretty obvious from before even looking at this function, but we got
to make sure we got to the bottom of each function. So if I remember right,
338724, the same password as last time, real sneaky guys.
1 2 3 4