We’re given this C source code:
1 2 3 4 5 6 7 8 |
|
If we can provide the correct number in *argv[1]
, passing this conditional, we’ll execute a /bin/sh
shell on
this suid binary and win. Continue on to see how it was solved.
Normally the main
function in a C program will look something like int main(int argc, char **argv)
where argc
is the number of arguments provided, including the program name as argument 0, and char **argv
or similarly char *argv[]
which is an array of pointers to character arrays (C strings) representing each argument.
The program’s environment variables char **envp
, (the third argument to main
, which has been left out in this case), and
the commandline arguments are loaded into the beginning of the stack area when the program loads.
Let’s say we run this program like this ./level02 one two three four
, and have a look at how that area of the stack looks in
radare2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
This is the X86_64 calling convention of putting function arguments in order in the registers: rdi, rsi, rdx, rcx, r8, r9
which means main was called essentially like this: main(rdi, rsi)
, or in this case literally main(5, 0x7ffe26db3678)
.
Somewhat confusingly this C program types argv
differently, this will be important later. So the four arguments we gave,
plus the program’s name gives us 5
here, and the stack address 0x7ffe26db3678
will be a pointer, to a pointer, to those
argument strings, so let’s look at that now.
1 2 3 4 5 6 7 8 9 |
|
We can see here the following arguments, ./level02
, one
, two
, three
, and four
, each separated by a 0x00
null character
to terminate the string. After that we have a similar situation for environment variables, LS_COLORS=...
shown here, which we
don’t care about.
Remembering that char **argv
and char *argv[]
mean essentially the same thing in C, let’s find out what happens when that
is changed to long **argv
.
In the normal course of things char *argv[]
holds an array of char *
which are C strings as shown above. In this program
we have long **argv
which essentially means a pointer to an array of pointers to long
. Confusing?
Basically what this is going to do is force a string entered on the commandline to be interpreted as a type of 64-bit signed
int called a long
. It’s going to force the bytes in a string to be interpreted as a 64-bit number, which we’ll subsequently do
some math on. Computers don’t care about data types, and in C we can take any pointer to some bytes and say, consider the following
bytes as this type.
Let’s have some fun, and convert an 8 character string into a 64-bit integer, a long
as this program is doing.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Giving the output:
1 2 3 |
|
We can see, because X86_64 is little endian, A = 0x41, B = 0x42
, and so on, is now backwards, and represents a fairly
large 64-bit number.
Since we now understand how a commandline string can represent a number, let’s find that number so we can pass the following condition:
1
|
|
This should be as simple as taking this equation and rearranging it terms of x
:
1 2 |
|
This works out to x = 12
, and it’s wrong. This is because we’re mixing regular algebra with integer division
and disregarding that integers on the computer have a finite range due to bit depth, and wrap around when they exceed the
resolution of bits used.
Now let’s say we have, for example 8-bit unsigned integers and multiply 99 * 5 = 495
. The maximum value of of an 8-bit
unsigned integer is 2^8 - 1 = 255
, and this is too high. Counting 0, we have a total of 256 distinct values.
In order to find out how this expression will actually work out on the CPU we can say instead 99 * 5 mod 2^8 = 239
, which
is the answer you would get for 8-bits after it wraps around. We’ll use that next for our 64-bit values.
I’m going to use Wolfram Alpha to solve for x
, because I don’t know offhand how to do algebra involving modulus. First
we’ll convert the large 64-bit hex into decimal. I very much like hex, but Wolfram Alpha doesn’t. Boo to that.
1 2 |
|
Then entering this equation into Wolfram to solve for x
:
1 2 3 4 |
|
This will give a valid solution for any n
, because of the cyclic nature of integer overflows, so let’s just pick n = 1
.
1 2 3 |
|
We now have one of an infinite amount of solutions, but we have a problem, this number is too large to fit into 64-bits. How can we tell how many bits a number needs to be represented? The base 2 logarithm of a number will let us know exactly how many bits are needed.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Looks like we should have just picked n = 0
and saved ourselves some time. I figured as much, but we got to show off
the log2
thing, so whatever. I’m interested in knowing if we can solve this with values of n < 0
, but not that interested
at the moment :)
Next we need to check that 8319100071223652201
is the number we’re looking for by multiplying it out, this looks like
it’s going to be a HUGE number, so we’ll again need to use modulus to wrap it back within 64-bits.
1 2 3 4 5 6 7 8 9 10 |
|
Our number is 8319100071223652201
, and in hexadecimal that is 0x7373617034366f69
. This looks suspiciously like each byte
is an ASCII character value to me, forming an 8 character string, so let’s convert that in radare.
1 2 3 4 5 6 |
|
Looks like we have found the commandline argument that will get us our suid shell, io64pass
so let’s try it out.
1 2 3 4 5 6 7 8 |
|
It looks like a CrackMe, or capture the flag exercise. The x86 assembly is clearly a virtual machine, and I assumed the block of text on the right would be a binary that runs on that virtual machine. I call the machine, for lack of a better name, Dan32, because as I later found out, it is a 32-bit virtual machine, and originates from Denmark.
The block of text on the right is base64 encoded, which is easy enough to to convert back into a binary file, but since it is an image, we can’t directly get at that block in a text format without doing some kind of optical character recognition. We can guess it is base64 encoded by the characters used, and really after you’ve seen a lot of base64, you can usually spot it pretty easily.
I tried a few online OCR services, which did not work, and since I had invested almost no time into this, I was ready to say the hell with it. I was not about to type all that base64 text into my text editor by hand.
I did end up solving this puzzle and creating tools to reverse engineer it, what follows is a detailed writeup, read on for more.
Note: If you are using a blocker such as Privacy Badger, like I do, I’ve noticed the terminal movie playback embeds from asciinema.org may be blocked by default. If you wish to see those in this post, you’ll have to toggle that domain to allow in your plugin, though you don’t need to accept cookies from that domain for it to work.
Staring at it a bit longer, we notice certain characters in the base64 side are bolded, and if we go through and write down each bolded character,
it spells out some nonsense MzJoYWNrZXI1NTd6amt6aS5vbmlvbgo.
.
I thought, since we are looking at a massive bunch of base64, that maybe this was also base64 encoded. We can use a tool called rax2
which
is a part of Radare2 in order to decode it like this:
1 2 |
|
It’s a vanity .onion
address on the TOR network. The site, which unfortunately is not online anymore, has downloads for both the assembly listing, and the base64,
saving us from needing to worry about how to get those characters into our computer by hand.
My approach to these sorts of files, that might be malicious or not, is usually to use hexdump
or a hex editor program to look at them
before going any further. After doing this to the base64 file, I notice that it is full of ANSI terminal escape sequences, and that these
ones are for positioning text at (x,y) coordinates, setting bolding etc. This is because if you were to cat
the file to your terminal,
it would reproduce the formatting seen in the image, with the question mark and all, which is pretty cute, and are actually required to
put the text in the right order to be decoded.
Before I cat
this to my terminal, I wrote a script to check each of the ANSI escape sequences to make sure they were only positional
and style commands, and nothing weird or malicious. They turned out alright, so I printed it to my terminal and copy pasted the
text into a file. Then I wrote another script to remove the end of line hyphens, join it all together, and base64 decode it, resulting
in a binary file that I named disk.img
You can find the complete source code for all of the Radare2 plugins I wrote to solve this on my github.
The provided x86 assembly for the virtual machine is bare bones, but it tells us everything we need to know to run this binary.
The label OP_TABLE
points to an enumeration of each opcode the VM supports, and the order, so we know the numeric value of the that op.
Some more information we learn from the given asm is
%define REG(r) [REGS + r * 4]
Registers are 32-bits wide%define PTR(p) [MEM + p]
It requires some read/write memory spacelea esi [DISK + esi]
It requires some read/write space to act as a diskmov eax, [OP_TABLE + eax * 4]
Every opcode is 4 bytes widecmov
is the only way to do conditionalsEven after learning all that information, it’s incomplete, some of the opcodes are not given implementations, such as write,
in, div
, and the various sized load.x, store.x
, and nor
, to name a few. So we’ll need to look at what’s given,
and implement those ourselves.
Here’s where a major part of my reverse engineering philosophy comes in, I don’t as a rule like to run random binaries given to me,
especially in malware/crackme situations. If I take the VM’s assembly listing, complete the missing implementations, and run the
mystery binary disk.img
, I have literally no idea what it is capable of at this point. Worst case scenario is that binary knows
about flaw in the given virtual machine, and exploits it for a VM escape onto my host system and starts doing shit.
I’m heavy on the static analysis side, but at this point I don’t have any debugger, or analysis tools that even understand this made up computer architecture. What I want to do, is use Radare2 to reverse engineer the binary, so I’m going to need to teach Radare2 about this file format, computer architecture, invent a textural assembly language, and so on. And that’s the real fun of this challenge for me, honestly, so that’s what I did. Radare2 allows you to write plugins to extend it, so it can understand any CPU, real or imagined, and simulate its running through ESIL (Evaluable Strings Intermediate Language).
The first Radare2 plugin to write, is the asm plugin. This plugin takes the 32-bit machine level opcodes and fills in a structure with information about that opcode, its arguments, and it provides a textual representation for viewing a disassembly listing.
In order to do this, we’ll write a plugin in C. The asm plugin’s main function has the following prototype
1
|
|
The parameters to disassemble
are:
RAsm *a
is the current assembler contextRAsmOp *op
is the structure we need to fill inut8 *buf
are the opcode bytes we are disassemblingut64 len
is the length of buf
The important fields of RAsmOp
to fill in here, are buf_asm
which holds the textual representation of the disassembled opcode,
and size
, the size of the opcode.
Looking at the provided x86 assembly code, we can see how to dismantle a 32-bit opcode into its constituent parts, remember all opcodes are 4 bytes long or 32-bits.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Becomes
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Next, for convenience, we make a lookup table that maps 0 to 63 to the corresponding register name. I happen to know from
the future, that r62
is the stack pointer, and r63
is the instruction pointer, but I didn’t know this at the time.
It makes reading the disassembly a lot easier though once we know this.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Since in the disassembly output we’re going to be referencing things by register name a lot, I grab the textual names for each argument as well.
1 2 3 |
|
What follows in the disassemble
function is a switch statement on op_index
, where we just need to fill in the
op size and the textual representation of the opcode itself. So I’ll show a few of those here, you can see the
full source of these plugins here
1 2 3 4 |
|
So for example the nor
instruction, which wasn’t provided in the image, just uses snprintf
to write out our
human readable disassembly, and sets the op->size = 4
. This ends up producing something like nor r21, r57, r57
.
Quickly taking a look at another example, movi
is the move immediate value instruction, and looks like this:
1 2 3 4 5 6 7 8 9 10 11 |
|
Notice op->size = 4
for all instructions, and setting op->size = -1
indicates an invalid operation. The above movi
instruction actually
encodes an immediate value directly into the opcode itself. This is the only instruction which does this, all other instructions
must move values into a register to operate on them. Again, this is a straight translation from from the given x86 asm.
Other instructions had to be put together just following the pattern that was set out for us. For example, div, mul, nor
all work the same as the given mul
opcode. All said, it is not a lot of work to get a fully functioning disassembler going in Radare2.
Here is the last part of the plugin, where we hook our code up by setting callbacks, and some information:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
And here’s the result, a nice looking assembly readout that we can use to start reversing the binary.
With the above plugin we can now see human readable disassembly of the binary, but Radare doesn’t have enough information about this architecture yet to allow us to step through the program and simulate it like you would in a debugger. And you can’t yet perform static analysis like you would get with IDA. Radare supports about one zillion architectures already, but since this CPU was probably invented just for this challenge, we’ll have to add support ourselves.
Radare’s answer to this is ESIL, (Evaluable Strings Intermediate Language), providing a register profile for the CPU, and using those to create an analysis plugin. An analysis plugin expects us to implement a function like this, to set the register profile.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Here we specify all 64 general purpose registers in the machine, and also give aliases to registers that have special meaning.
The format is gpr <registername> .<size in bits> <offset>
.
With this we can create a register file containing registers of various sizes, which can overlap. For example in x86, we can specify
register gpr ax .16 0
, but also specify the high and low bytes as gpr ah .8 8
and gpr al .8 0
.
Dan32 doesn’t have overlapping registers, or high and low register access by name, so we don’t need to do this.
Some register aliases are A0, A1, A2, which are for arguments that are passed to functions via register, which is pretty common in this binary. LR is the link register, which like on an ARM CPU holds the return address of a function, PC, is the instruction pointer, and SP is the stack pointer, so I’ve filled those in after having gotten some experience with the binary’s two calling conventions.
The next task for the analysis plugin is to create ESIL for each and every instruction supported by the CPU. There are not many instructions so this didn’t take very long.
The plugin must implement an analysis function with the following prototype, which looks extremely similar to the asm plugin function:
1
|
|
Here, we’re asked to fill in more information about the opcode in the given RAnalOp *op
parameter, it looks something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
All of these are pretty important for proper analysis, but the most important, so that we can simulate this binary inside
radare2, without running it on the untrusted VM we were given, is the ESIL. Here is an example of ESIL for movi
, the
move immediate value instruction:
1 2 3 4 5 |
|
ESIL is a stack machine, turing complete, so it is able to represent the instructions for any CPU, it is like a ridiculous
sort of microcode almost. A more complicated instruction cmov
, the conditional move instruction, looks like this:
1 2 3 4 5 |
|
So after each instruction is codified by type and given an ESIL representation, we’re done. If you are interested in how ESIL works, here’s the docs. I’ve written some pretty crazy ESIL for the disk sector read/write code, and stack machines are not my favourite, but they work :) Here is some of the longest ESIL I wrote for one opcode. It reads 512 bytes from a numbered disk sector, into a given memory address.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Ok, so now we’re all set to get a disassembly view of this binary, we’ll just load it up in Radare2, hit play to see how it goes.
“Wrong Endianness”. Now, there are a few things going on here, so let’s look at the first instruction: movi r00, 0x78200
.
I don’t want to get to bogged down in the details, but I know from the future, that register r00
is like the zero register
on a MIPS system, it always contains the value 0, and so here writing 0x78200
to that register, is effectively a no-op,
and we’ll see why that’s done in the next part.
Next up we have movi eip, 0x14
. There are no jump instructions in this opcode set, and unlike x86, you can write to the
instruction pointer register to get a jump. Interestingly, jumping to 0x14
is not a multiple of 4, and so we’re jumping out of alignment,
which is why we’re seeing the disassembler isn’t able to interpret a few instructions after that at first.
When we get to the address 0x14
, we end up at series of instructions that loads immediates, and then uses out
to print them
out to the display.
1 2 3 4 5 6 7 8 9 10 11 |
|
A little bit of radare knowledge, the immediate values were displayed as hex to begin with, so I wrote a little radare expression
to hint to it that those immediates are actually string or char values using the ahi
command, which stands for “analyse hint immediate”.
Radare2 is terse as hell, and you get very used to it, and probably, maybe, start loving it. The expression below basically creates
a range from the current address, denoted as $$
, to $$ + 17 * 8
with a step of 8 bytes. The @@=
functions as an iterator, which runs
the command ahi s
on each address in the range, telling radare the immediate values are character values.
1
|
|
Anyway, the real problem is we’re interpreting the binary as little endian, when it’s actually big endian. So we can just go back to our plugin and fix that pretty simply in the disassemble function by reversing the bytes, and setting the endian properly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Back to the entrypoint of our binary, what was once a no-op, when read backwards, jumps us past all the “Wrong Endian” stuff, and begins displaying the binary properly so we can reverse it.
The opcode 87e00180
when read in big endian jumps us with movi eip, 0xc
, and another jump movi eip, 0xa8
, bringing us
finally to some actual code.
1 2 3 4 5 6 7 |
|
This is the first in a series of tricks and tests that the binary performs on the virtual machine itself to make sure it is implemented properly. With the use of a bin plugin for dan32, which I’m not going to bore you with here but is available on GitHub with the rest of the code, I’ve tried to setup a memory layout that would be familiar for someone like myself who works with ELF or PE files. Here is that layout.
1 2 3 4 5 6 7 |
|
Remembering the DISK
address that was mentioned in the x86 assembly VM, which is meant to represent a readable writeable
disk area from which the program is loaded. This area stores disk.img
in the .diskrom
section at address 0x200000
. I probably
shouldn’t have called it a diskrom, since you can write to it, but I didn’t know it was going to be written to at the time, so
it’s too late now. I actually believed it was going to be something like a game cartridge rom at first, but oh well.
The code is executed from the .text
section, with entrypoint 0x0
, and we have a .bss
section which contains some initialized
data that is used in the program. The read
and write
instructions are used to copy data from the disk into memory by 0x200
byte
sectors.
So the trick here, is that at 0xb4
we are writing from r00
, which always contains zero to address 0xbe
, which is altering
an instruction. Then, if your read
instruction works properly this is immediately corrected by reloading the entire first
sector from .diskrom
back into memory, undoing the damage. If your read
instruction is not working, you will be greeted by
the text “Disk read error!” and the program will halt.
Notice how the analysis plugin is working, showing beautiful ascii arrows that point to the destinations of our jumps. When
the zero is written to address 0xbe
, it modifies the instruction, and we see the control flow is taking us directly towards
“Disk read error” and a halt. The read
immediately fixes this and the control flow updates.
Next we move on to 0x14c
, which is an area of the binary that sets up the stack pointer, and reads the rest
of the program from .diskrom
one sector at a time.
I guess here we can get our first look at how dan32 goes about things.
1 2 3 4 5 6 7 8 |
|
Here’s some things to take note of right off the bat:
nor(a, a)
flips all the bits of a
nor(a, a)
is equivalent to -(a + 1)
in 2’s complementr00
is a hardware zero registerr57
is used as a temporary registerThere’s a few patterns we see using NOR throughout this binary. Above we want to save the instruction pointer to r61
, and
then subtract 4 from it. This is done many times in this binary like this:
1 2 3 4 5 6 |
|
This is a roundabout way of just saying r61 = eip - 4
, but that’s what we’re dealing with here :)
The next test the binary performs on the virtual machine, is to test the div
instruction. Since
this instruction was not provided in the x86 assembly code, it is to ensure we’ve got the argument
order right, and we’re not allowing division by zero. If we’ve done it wrong, we’re sent off to
some code that prints “ALU Malfunction (DIV)” and halts the program.
By the way, these symbols such as fcn.alu_malfunction_div, fcn.main
, and so on, were added by me
while reversing the binary to make it more clear what is going on.
1 2 3 4 5 6 7 |
|
That’s also our first look at conditionals in dan32. There are no compare instructions, there is
no zero flag, and no conditional jumps like jne
, as you find in other instruction sets.
An interesting side note I guess, is that there is no real compare instruction in x86 either, the
cmp
instruction on that processor is actually an alias for subtracting the two values. When
they are equal, since a - a = 0
, this sets the zero flag, which is what instructions like jne
are conditional on.
Now that the Radare plugins are working, let’s have a look around the binary, simulate it a bit, and look around. This is loading a project file where I’ve already reversed the entire binary, but gives a good idea of how it’s working.
Here’s where we start getting some idea of what this binary is up to, and finally get to see some proper
functions such as print()
, scan()
, memcmp()
and things like that implemented.
1 2 3 4 5 6 |
|
First up, radare has identified a string “Password: “, and helpfully renamed its address as the symbol
str.password
for us. Here we can see one of the two calling conventions in action. This one is a
lot like fastcall, where we load the first few arguments of a function into registers r03, r04, r05
,
and end up with our return value in r01
.
Remember I identified r59
as the link register, and that is used as our return value. So here, the
calling convention is to calculate the return address, eip + 8
, two instructions away, and store it
into r59
, we then load the address of the print function fcn.print
into a temp register, and jump
there.
Throughout the binary, r57
is always used as a temporary register. There are others such as r20, r21
which are always used as counters or array indices. In fact, this assembly code is so consistent in the
way it does things and which registers it uses, that I wonder if it was emitted by a machine, or written by
hand by someone who is just awesome.
Now that we know our arguments, our return address, and where we’re going, that about fully describes this calling convention. There is also a stack based calling convention like you would find on x86 32-bit, which I may write about later.
So don’t worry I’m not going to bore you to death by literally explaining every function, but this print one is a fairly simple example to start with.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
This function is simple, but normally to reverse a difficult function, I will slowly replace elements of the disassembly with C, until I have a C function. In this case, we’d have a for loop:
1 2 3 4 5 |
|
I’ve kept from mentioning the actual purpose of this program for way too much of this article. If written in C, the main function would just about look like the following code. This was figured out by reversing each function in turn, and I got a happy surprise at the end, we’re going to be dealing with encryption.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
So what’s going on here, is we’re going to do an in-place decryption of the DISK
section if we’ve
entered the right passphrase.
We’re able to figure all this out, without running this binary at all, through static analysis and a bit of ESIL simulation. I guess the question is how did I know which function was doing the decryption, what actual encryption algorithm was being used, and how I’m going to figure out the passphrase without even running the binary.
The answer to the first question is that I knew I would probably need to find XOR somewhere
in this program which would XOR the ciphertext with the key stream, but since we have no XOR
instruction, I knew it would need to be created with a group of NOR
, which I found pretty easily.
So I spotted that pretty easily, and pinpointed the main decryption routine.
1 2 3 4 5 |
|
The answer to how did I know which encryption algorithm it was, is more funny. I didn’t know which one it was. I had stepped through the key scheduling function a few times, after it prints “Initializing Encryption”, and thought that it was basically key stretching the passphrase. It was only later when I was randomly reading through a writeup on some malware which used RC4 encryption, that I realized what I was looking at the same RC4 key scheduling algorithm.
The main encrypted part of the binary was identified by the address that was being passed to the
decrypt()
function, which was 0xc00
. I also previously noticed this while running the binary
through the entropy function of the program binwalk
. Here is the output from binwalk -E
1 2 3 4 5 6 |
|
One of the weaknesses of some encryption schemes, is in how it checks if the passphrase is valid before decryption. Say for example you enter a passphrase on a zip file or something, and the unzip program just blindly decrypts the file without knowing if the password is valid. It’s going to produce total garbage if the passphrase is wrong, and the program won’t have any way of letting you know that you’ve entered the wrong password.
So a common, bad, way to verify the password first, is to have some known ciphertext, plaintext pair that is encrypted using the passphrase right in the binary. You enter the passphrase, it decrypts this small ciphertext, compares it to the known plaintext, and if it’s correct, it says “yay” and moves on to decrypting the rest of the file. If it’s wrong, it says “boo”, and doesn’t decrypt the file into garbage.
This is what’s happening in our dan32 binary. The known ciphertext, plaintext pair is included, meaning we just have to crack that.
Here we can see that before decrypting the DISK
section, it tries to decrypt
a small 56 byte buffer, and then compares that to a valid string that is included in the program.
1 2 3 4 5 6 |
|
Here we can output the short encrypted buffer, and its valid decryption. If the passphrase we give
doesn’t decrypt this short buffer correctly, the program will halt. Here I use the px
Radare command
to do a hexdump of the test ciphertext, and another ps
command to print the plaintext string.
1 2 3 4 5 6 7 8 9 10 11 |
|
That means in order to crack this passphrase, I only need to figure out how to successfully decrypt this 56 byte buffer, which is something I can do entirely outside of this binary. I decided to reverse the key scheduling and decryption routine, and rewrite it in C so that I could brute force the password quickly outside of this environment, and outside of Radare.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
And finally the decrypt routine.
1 2 3 4 5 6 7 8 |
|
I made this into a complete program, that takes a passphrase as argument, and then wrapped it in
a short Ruby script that repeatedly tries passwords from a list I have, until the decrypted result matches.
This only took about 3 minutes, the passphrase ended up being agent
This was lucky, because it could
have been a lot harder if it wasn’t a simple word, I would have needed to use hashcat or something a
bit more sophisticated. There are also flaws with RC4 itself, which directly relate to the problems
with WEP, but I didn’t need to go that route.
I can’t really call writing this code a waste of time, since I ended up needing to do it in order to
actually identify the algorithm, but there are far easier ways to decrypt RC4 that I could have used,
for example, Radare2 comes with a program called rahash2
which can, among about a zillion other things,
be used to decrypt RC4.
1 2 3 4 5 6 7 8 9 10 11 |
|
At this point I am thinking I will just dump the high entropy section of the binary from 0xc00
onwards
out to a separate file, and decrypt it with rahash2
and be done with it, but when I try this, I end
up with unintelligible garbage, that isn’t proper dan32 code, and isn’t anything else I can
recognize.
The DISK
section is divided into 512 byte sectors, and it turns out they are not decrypted in the order they
appear in the file. The order of the pseudorandomly generated keystream matters since it’s a stream cipher,
and so that is why I’m getting garbage out. I decided then to just simulate the decryption process within Radare
using ESIL, since I put so much work into properly defining each opcode in ESIL, it does simulate the VM perfectly.
The only problem is, that I have not implemented the in
and out
opcodes for doing IO, so I would be
running the program blind, and be unable to enter the passphrase or see printed output.
An easy way to avoid writing the in
opcode, is for me to simulate the program up until it is about to
ask me for a passphrase, stop there, and just write the passphrase into memory at the right address, and
skip over the scan()
function entirely and continue afterwards, so that’s what I’ve decided to do.
For the out
opcode, there is a less hacky solution. I can use ESIL to simulate an interrupt, and attach
that interrupt to an external program that will receive the character value to be printed. I wrote a short
Ruby script which accepts an argument and prints it to the standard output. And inside Radare2 simulate the
binary like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
This takes a while to complete, so I just went off and did something else. Another thing, not shown here is that the binary often calls functions that just do nothing but waste enormous amounts of time counting, which have no effect on the output. I patched these calls out of the binary so I wouldn’t have to wait 2000 years for it to finish.
Once all is said and done, the binary has completely rewritten the DISK
section into yet another binary, and we’re
given this message:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
I dump the DISK
segment into an actual file, and reload that in Radare, and sure enough it has decrypted itself into
a webserver written in dan32. I reversed this new binary for a while, and found it contained:
I used binwalk to extract these from the binary and looked through the contents. I got the flag, so I’m calling this one done. Good experience overall, 10/10 would crack again. I’m so proficient in reading dan32 assembly now, that it’s a shame I’ll probably never have any use for it again, it’s a pretty nice VM.
]]>So I like to take some of my free time and play hacker wargames like smashthestack.org, or reverse engineer crackmes, so I decided to describe my process for solving those types of challanges here.
“Crackmes” are essentially a compiled binary without source code, which asks you for a password or key, you enter the wrong password you lose, you enter the right one and you win and move onto the next. You need to reverse engineer the binary in order to discover the password, and these range from simple to tedious and difficult as the creator of the crackme piles on layers of obfuscation, misdirection, encryption, and anti-debugging techniques to stop you, not to mention you need a good handle on assembly and keeping track of what is in various memory locations during runtime.
We start out simple, and so I start with my simplest techniques. crackme0x00a
wants a password:
1 2 3 4 |
|
So first we think that perhaps the password is simply in the binary, and we can
find it by running the program strings
on it to list every string in the program.
I think it’s safe to say that will work on this one, but I have a more direct
approach that I usually take, and that is run the program through ltrace
. ltrace
outputs all calls to library functions that the program makes, so here we’re looking
for something like a call to strcmp
1 2 3 4 5 6 7 |
|
Simple, we found the call to strcmp
the password is g00dJ0B!
. You see how we can
see each call, with parameters and return value. This was all that was needed to
beat this challenge.
Let’s move onto the next one, with the same approach
1 2 3 4 5 6 7 |
|
So the difference here is the password is being compared with wcscmp, which has
the prototype int wcscmp (const wchar_t* wcs1, const wchar_t* wcs2);
It compares
wide strings where each char is 16-bit.
On Linux most of the time the code and data sections get loaded around the address
0x08040000
, and local variables are stored in the stack, which begins at the
top of memory and grows upwards from 0xffffffff
.
The arguments to wcscmp
are 0x804a040
, which I’m guessing is in the initialized
data section, and is the secret password, and 0xff82eb4c
which ltrace
shows is
the location on the stack that it wrote our input to.
The trick to this challenge, is that if we were simply to run the strings
program
on the binary, it would have worked in the previous challenge, but not this one
because each character in a wchar_t
string is 32-bits long with the most significant
bits zeroed out, and terminated by a a 32-bit 0x0000 value.
The quickest way I know to find out the bytes at 0x804a040
is to load the
binary into my debugger radare2
and just print it out.
1 2 3 4 5 6 |
|
The password is w0wgreat
.
Ok, let’s see what’s new in the next one, and hopefully get to use radare2
some more. First let’s see what happens with ltrace
.
1 2 3 4 5 6 7 8 9 |
|
So usually I just write in blah
for a password, but we can see here that scanf
has returned 0
in response to my string. scanf
returns the number of things
it has parsed according to its format string. So I’m guessing its format string
was not "%s"
this time as it was before. Maybe it was looking for a number.
Let’s load it into r2, dissassemble the main function and find out.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
So, woo, radare is the best. It has added helpful identifiers for us rather
then make us look up a bunch of addresses. For example str.Invalid_Password__n
is an
identifier standing in for the address of the string `“Invalid Password\n”.
The assembly listing is easy to follow thanks to the analysis radare2
has done,
and the symbols it has added for us, it even graphically shows branches and loops.
The first thing to know when you look at the assembly listing of a function created by a compiler, is that there is going to be 3 main sections: the function prologue, the body, and the function epilogue. If you already know this, you will probably want to skip down a little ways.
Like I mentioned before, the stack is an area of memory which in Linux begins at
0xffffffff
, and grows upwards in memory, and the CPU register esp
points
to the top of the stack. To get this analogy you have to think of memory as
a something like a container that holds plates. The bottom of the container is
0xffffff
, the top of it is 0x000000
. When you place plates onto the stack
over and over, the pile grows upwards. That is pushing things onto the stack.
To remove things from the stack, you “pop” them off of it. You can’t pop a plate out of the middle or the bottom, you can only pop a plate off the top. It is a LIFO stack, last in is first out.
An x86 CPU has push
and pop
instructions for this, and guess what, compilers
don’t use them very much, prefering to instead just do arithmetic
on esp
or ebp
in order to get values from the stack or manipulate its
size.
In a running program, we use the stack as a “function call stack”, which is a sequence of “stack frames”, one per function called.
So let’s explain how that’s laid out so we can get back to the crackme. Take this example nested function calls:
1 2 3 4 5 6 7 8 9 10 11 |
|
1 2 3 4 5 6 7 8 9 10 |
|
The calling convention used here is called cdecl
, to call a function we
put its arguments on the stack in reverse order, and use the call
instruction.
call
automatically pushes the address of the next instruction after it
onto the stack, so we will know where to return to after the function call ends.
Once inside the function, space is made to hold the local variables, usually
by subtracting the local variables size in bytes from esp, raising the stack
up higher, sometimes this size is aligned on a 16 byte boundry by using the
and
instruction. This is why there is junk data in uninitialized local variables,
they just hold whatever garbage happened to be on the stack.
main()
like any other function has a prologue.
1 2 3 4 5 6 7 8 9 10 |
|
So we save the old base pointer (base pointer, frame pointer, same thing) then start subtracting from esp to increase the space for local variables.
It makes space twice, using two different methods for aligning it to a 16-byte boundry. You could ask, why doesn’t it just do this all in one subraction. I have no idea why it does it this way, but I do know that when you compile with optimizations off, the compiler often does redundant things. You might also ask why it reserves so much stack space for the one 32-bit local variable that actually exists in this main function.
Your guess is as good as mine. In the assembly listing we can see that
the only local variable used is called local_4h
, which is the value at
ebp - 0x4
, the int
sized local variable that scanf parses our input
into.
1 2 3 |
|
The leave
instruction is the same as mov esp, ebp
pop ebp
. This
effectively undoes all the stack resizing the prologue did, and then
restores the base pointer to what it was before main
was called. Yep,
there is another hidden function that calls main()
There is a register called eip
that holds the address of the current
instruction, but you can’t directly assign a value to it except to use
a branching instruction or ret
. After leave
is executed the next
thing on the stack is the return address of the caller, ret
pops that
address into the eip
register and execution continues from there.
A function’s return value is always stored in the eax
register, so
mov eax, 0
is the same as return 0
at the end of main()
.
1 2 3 4 5 6 |
|
The first instruction, lea
stands for load effective address, and is
good at doing pointer arithmetic, calculating the address of elements
inside arrays, and things like that. [ebp - 4]
is the address of the
local int
sized value on the stack we want scanf
to write to.
local_4h
is just a symbol radare uses to remind us what is in a memory
location, and it allows us to rename it if we want, it just equals 4
in
this case.
0x804854c
is the address of the constant string “%d” located in the
data section. So this code is just moving the value 0x804854c
to
the top of the stack, and the address of our local variable 4 bytes after that
in reverse order that scanf
takes them, then we call
to scanf
.
At this point whatever we wrote, if it was a number, will be located at
[ebp - 4]
, and this value is compared with 0x149a
, if they are equal
we jump to the “you win” screen.
So that solves this crackme, all we need to type for a password is, in
decimal 5274
.
1 2 3 4 |
|
Ok, let’s hope they get a bit harder from here on, and this should go a lot faster without needing to explain the stack, etc.
A quick run shows that ltrace
is not going to help us this time, so
back into radare we go. I’ve renamed some of the local variable so it will
be clearer what is happening after the scanf
.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
So we have 3 local variables, number_entered
, a
, and b
. Let’s translate
this to C psuedocode and check what the condition for winning is.
1 2 3 4 5 6 7 8 9 10 11 |
|
After doing that arithmetic and register shuffling, it turns out b = 338724
, which
is the password.
In crackme0x03
we have all the same stuff as in the previous, but now we’re calling
a function test(number_entered, b)
which will test the password.
1 2 3 4 5 |
|
So let’s have a look at this test
function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Ok so what’s going on in here. First thing I see is 8 bytes allocated on the stack and
then never used (for some reason), and I see our two arguments coming in at epb + 0x8
and ebp + 0xc
which I’ve renamed to match the calling code.
We then compare number_entered
and b
for equality, which chooses one or another
string containing giberish to be passed to the function shift()
. In either case
the return value of test()
is not checked, and we return from the test()
function.
Let’s check the actual contents of these gibberish strings out in radare with the ps
command to print strings.
1 2 3 4 |
|
Let’s remember that, and have a look at the shift()
function which I’m going to
guess is going to decrypt these strings somehow.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|
I’ve renamed the local stack variables to be what I believe their purpose is.
This is a bigger function than we’ve seen so far, this loop pattern we see
here is what a for
loop looks like in assembly. You can see how counter
is initialized to 0
, then a condition is checked which either processes
the body and increments counter
, or it jumps out of the loop.
I guess is going to iterate over the string that was passed in, and probably “decrypt” it. Maybe it will help if we translate this back into C.
1 2 3 4 5 6 7 8 9 |
|
All it did was subtract 3
from the ascii code of the gibberish string, which
is why we say it is “decrypting” in quotes :) Remember we still have not run
this binary yet, so let’s just write up a quick Ruby script to see what the
giberish decodes to.
1 2 3 4 5 6 7 8 |
|
Ok so we’ve learned that if the password you enter is equal to the variable
calculated to b
, we will select and decode the “Password Ok” string, and win,
that was pretty obvious from before even looking at this function, but we got
to make sure we got to the bottom of each function. So if I remember right,
b
was 338724
, the same password as last time, real sneaky guys.
1 2 3 4 |
|
Here is a neat example of using shaders in WebGL, read on to see the GLSL shader code
This is an example of Ray Marching using Distance Fields. The map
function given
here is for a sphere, it accepts a point p
and returns the distance p
is from
the surface of a sphere with radius 0.25
units.
By first manipulating p
by calling fract(p)
on it (basically mod 1), we get
multiple equally spaced spheres. length()
performs the 3D pythagorean theorem
in order to see how far away p
is.
The trace()
function receives the origin o
and the ray r
, and casts the ray
outwards towards the object in the map()
function by multiplying it by t
.
This makes the trace()
function search for the surface of an object iteratively,
here over a maximum of 32 iterations. Each iteration it projects the ray half the
distance remaining returned by the distance function, which zooms in safely until it finds
the shape’s boundary.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
|
The screen here is really a quad the same size as the viewport, so each fragment has a uv coordinate on that quad.
Next we expand the 2D uv coordinate into 3D called r
for ray, and normalize it to unit length, and establish an origin.
together these make a ray that is cast through that point on the screen. I adjust the z-coorindate
by multiplying it by time
which makes us seem to zoom through the scene.
The next thing I do is create a 2D rotation matrix, which I multiply by the ray’s xy
vector, the
angle of rotation theta
is also adjusted over time.
Now I use trace
to calculate the distance a ray travels into the scene before hitting something. The
value fog
is trying to use an inverse square equation to make more distant points seem darker than nearby points.
Finally fog
is used to compose an RGB color, mostly purple, but also modified by time to slowly change
the color of the spheres. Finally the fragment color is returned.
1 2 3 4 5 6 7 |
|
Not much happening in the vertex shader, we apply the model and projection matrices, most importantly
we pass the time
and resolution
uniforms we receive from javascript along to the fragment shader.
The NES was programmed in 6502 assembly language, and lucky for me, it actually has a really straight forward instruction set, but still, last month I was still not familiar with it enough to make anything. Now, usually when I don’t understand something, I write a program that somehow involves whatever topic I’m learning. You can’t write a program like an assembler without understanding how the processor for that assembler works, so I wrote my own assembler for the NES called n65.
This is the assembler I will be using in this article, and to help me compose music on the NES.
You can easily install n65 through RubyGems.org:
gem install n65
Before we can get the NES to make any sounds, there are a few things to know about writing
programs for it. The 6502 is an 8-bit processor, because most of its internal work registers
are 8 bits wide. The general purpose registers you have direct control over are named
A
, X
, and Y
. There are a few other 8 bit registers, such as the status register
and the stack pointer. The PC
, or program counter though, is 16 bits wide, which means
the processor can address memory from $0000
to $ffff
, a total of 65536 addressable bytes.
A NES does not have 64KB of actual RAM, the mainboard comes with 2KB of work RAM attached to the CPU, and 2KB of video RAM attached to the Picture Processing Unit, or PPU, which is actually a separate processor entirely, both of these processors run independantly of one another, but they can communicate through memory mapped registers.
The Audio Processing Unit, or APU, is also very much like a separate processor, although it is on the 2A03 chip, and is one of the things that makes a 2A03 different than other 6502s. You also communicate with the APU by writing values to memory mapped registers.
Image from www.jfbillingsley.com
Tiny, tiny amount of work RAM $800
bytes or 2KB. If you’ve never worked on anything
with such a small amount of RAM, well that’s actually the challenge here, that and having
no operating system is what makes it fun in my opinion :)
You don’t have to do everything yourself however, the PPU is a hardware tile and sprite engine, and the APU is a synthesizer. When the machine starts up, it is ready to display graphics and play sounds all by itself, all you need to do, and the majority of what your code is about, is moving the right data into the right places, and the hardware does the rest. This is typical of older generation consoles, and even newer ones up to at least the Nintendo DS.
Before we make our first beep, let’s set up a project skeleton that does the initialization that all NES games need to do. This involves setting the hardware into a known state, setting up the stack pointer, and clearing the work RAM.
When we produce a binary from our assembly code listing, we are writing code located
in the cartridge ROM area, shown above starting at $8000
. The ROM binary we produce
would be written onto an actual ROM chip inside a NES cartridge, or more likely we’re
going to run this on an emulator, in which case n65
is setup to produce iNES formatted
“roms”.
The iNES file format is simply a 32 byte header, followed by at least one 16KB PROG ROM, and zero or more 8KB CHAR ROM sections. For now we can forget about CHAR ROM, let’s just create one PROG ROM section. These sections are either PROG for code or CHAR for graphics data, and are numbered in banks.
One last thing, the cartridge program ROM section is shown to be mapped from $8000 - $ffff
,
this is 32KB total, not 16KB. If we specify 1 PROG ROM section in the header, that section
will be mirrored twice to cover that area of memory. This is important because we must
remember to write an interrupt vector table to $fffa
. This is why we will assemble our
prog section bank 0 at $C000
, because we are only using one bank of 16KB, it tends to
make more sense as far as addressing is concerned.
The interrupt vector is 3 16-bit addresses, that point to code for handling VBlank, Reset, and IRQ respectively. Every ROM must specify these, because the Reset vector points to the beginning of our code, think of it like specifying your main() function in a C program.
Ok here is an NES project skeleton that just initializes the hardware:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
|
So, let’s assemble this file using n65, and load it into an emulator. This means running n65 in your terminal.
1 2 3 4 5 6 7 8 9 10 |
|
I develop on OSX, though n65 works under Linux, and probably Windows if you have Ruby installed.
If you named your assembly file program_skeleton.asm
, by default you’re going to get a NES rom
file with the name program_skeleton.nes
. The output filename can be changed to whatever you like
with the -o
flag.
For an emulator, I am a fan of FCEUX because of its debugging and memory viewing capabilities, but the native version of FCEUX for OSX does not seem to include those features.
So, for debugging on OSX, I’ve taken to running FCEUX under Wine, which works well enough for that purpose. With FCEUX under Wine I can use the debugger, and if you run this ROM and open the memory hex editor you should see the first bit of RAM initialized like this:
So, why did we initialize $200 - $2ff
to $fe
?
That 256 byte page of memory is typically used as “shadow OAM”. OAM is Object Attribute Memory, and resides in the PPU’s video RAM, but rather than update it directly, most people keep a copy of it here in CPU RAM, and transfer this copy to the PPU once per frame.
Remember how I said the NES has a hardware tile and sprite engine built into the PPU that just starts running on its own provided you put data in the right spots?
OAM has the x
and y
coordinates (among other attributes) of the 64 possible
hardware sprites on the NES, each sprite OAM is 4 bytes, and 64 * 4 = 256
. Shoving $fe
into
that area, will have moved all the sprites offscreen, because it is possible we’d see a bunch of
garbage sprites if we turned on the PPU and started coppying it over, so shadow OAM is typically
initialized with this value.
That’s enough about sprites and graphics though, since for now we are just interested in getting a simple sound out of the APU. In fact this program skeleton has turned off the PPU, and VBlank for now, and so rendering is totally disabled. We will turn it on again later when we want a stable timer for our music.
The next easiest step to keep us moving forward, we can get the NES to make a beep using its APU. I’ve found a ton of good info on NESDev.
So here is a rundown of the parts of the APU:
$4000 - $4003
Pulse 1$4004 - $4007
Pulse 2$4008 - $400B
Triangle$400C - $400F
Noise$4010 - $4013
DCM$4015
Channel EnableWe’re going to want to use one of the Pulse oscillators to make a beep, so let’s use Pulse1.
We want to use the channel enable to enable that oscillator, then write the correct values into
the area $4000 - $4003
, to make a concert A 440hz
note come out with the right duty cycle.
Here is what the four bytes that control Pulse 1 do, notice that many things are packed into one 8-bit byte. Here is what each bit does:
nes.apu.pulse1.control
$4000 DDLC VVVV
D : Duty cycle of the pulse wave 00 = 12.5% 01 = 25% 10 = 50% 11 = 75%
L : Length Counter Halt
C : Constant Volume
V : 4-bit volume
nes.apu.pulse1.ramp_control
$4001 EPPP NSSS
E : Enabled flag
P : Sweep Divider Period
N : Negate flag, inverts the sweep envelope
S : Shift count
nes.apu.pulse1.ft
$4002 TTTT TTTT
T : Low 8 bits of the timer that controls the frequency
nes.apu.pulse1.ct
$4003 LLLL LTTT
L : Length counter, if Length Counter Halt is 0, timer for note length
T : High 3 bits of timer that controls frequency
APU register $4001
does pitch sweeps, so let’s ignore that for now, and set up enough registers
to get a beep at our desired frequency of 440hz
. I’ve listed symbolic names for each of the
registers as well, you get these symbolic names by the assembler directive .inc <nes.sym>
in
the above code listing.
A pulse wave is a type of function generator that is either on or off, and the duty cycle parameter tells the APU what percentage of the time the wave is on or high.
Setting this effects the timbre of the sound, let’s just set it to 50% for now, this is a 2-bit
parameter and that corresponds to %10
in binary. The percentage sign prefix on a number means
binary, whereas the dollar sign means hexcidecimal, if you were wondering.
For volume, parameter V, we are allowed 4-bits of resolution, let’s play it as loud as possible,
%1111
. We should also set bit C to %1
, to allow constant volume throughout the whole time the note is
played, and set L to %0
, because, we want to use the Length Counter, not halt it.
In the end, when we put all these parameters together into nes.apu.pulse1.control
register, we
get the final value of %10011111
, or $9f
in hex. I find it a lot clearer to write out registers
with composite values in binary rather than hexcidecimal though.
This register is for setting the frequency of the generated sound. An oscillator generates a waveform that resets itself periodically, and repeats a certain number of times per second. We want the pulse wave to reset at 440 times per second so we will get the note concert A, below middle C.
Internal to the APU is a counter which counts at a specific rate, related to the speed of the 2A03
processor, when it reaches the value we set to T
, it will reset the period of the waveform, giving
us the right frequency. But it turns out the size of a register, being 8-bit cannot count high enough
to give us the low notes we expect to be able to produce in music.
Frequency in hertz has an inverse relationship to time in seconds:
That means the lower the note in hertz, the longer the timer will have to count, and means the counter
has to be at least 11-bits, so the value T spans all of the 8 bits in nes.apu.pulse1.ft
and the
three extra high bits end up in nes.apu.pulse1.ct
There is a formula for determining this 11-bit counter number T
from frequency in hertz:
Where CPU = 1789773.0
, which is the speed of an NTSC Nintendo’s processor in hertz. This value
is different on a PAL NES, in that case CPU = 1662607.0
.
Here is a Ruby function for determining the value of T
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Hey, lucky us, the value 253
actually does fit within 8-bits. But that’s cutting it pretty close,
very much lower in frequency that number would have been higher than 255
, and we would have needed
to put extra bits into nes.apu.pulse1.ct
.
This gives me an idea, let’s allow both frequencies to be played. Let’s create a NES ROM that plays
a 220hz
note if the B button is pushed, and a 440hz
note if the A button is pushed.
So the 11-bit value of T
for each frequency is then:
440hz = %000 11111101
220hz = %001 11111011
Like we just mentioned, nes.apu.pulse1.ct
contains 5 bits of length counter, to control the duration
of our note, and the 3 high bits from T
the frequency counter.
For now let’s set the note length counter to its highest value, %11111
Since we just decided to allow button B to make one note, and button A to make another, we need to learn how to read the NES’s controller buttons. The best way to do this, is to create a subroutine which reads each of the buttons we care about, and stores their states into the work RAM area. This way, we can detect when the button is first pressed, and not just held down.
Controller 1, like everything else, is a memory mapped device, and it is mapped to address $4016
,
or, if we’re using my symbolic names, simply nes.controller1
.
To tell the controller we want to read its button states, we have to “strobe” the controller, which
just means to write a $01
and then a $00
to it in succession. Then bit 0 of the next eight bytes
that we read out of that location reveal the state of each button in the following order:
Here are some modifications and additions. First we add an initialize subroutine to our reset which enables both pulse1 and pulse2, and turns the PPU back on so that we get VBlank interrupts again.
The VBlank is the portion of time when the screen is in-between drawing, and is a handy place you
can put code you would like run every frame, at 60 frames per second. We use this to call our
read_input
subroutine repeatedly.
The read_input
subroutine strobes controller1 and reads the state of the A
and B
buttons,
and stores their state into controller_state
as defined in RAM at address $0000
in the zero
page. The zero page is the first 256 bytes of RAM, which can be quickly accessed using only
an 8-bit address, when you see a zp
suffix on an instruction, you are telling the assembler
it can use the quick zero page addressing mode.
We store the state of the A
and B
buttons in RAM so that we can tell when they are first
changing state, so that we only play our notes once per button press.
Then we’ve added two subroutines which load the values discussed above into the APU to get
our notes to play. nes.apu.pulse1
and nes.apu.pulse2
both operate in the exact same way,
which is why we turned them both on, each button triggers one of the two pulse generators,
which means we can hear both sounds at once, overlapping if needed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
So that’s it for our first attempts to get the NES to make sounds. I have actually been working on MIDI to NES converter, which converts a MIDI file into a byte stream that can be quickly written to the APU registers at the correct tempo, which is almost complete. That will be the topic of my next post.
You can find n65, my NES assembler here.
And the source code for this NES sound example here
]]>It turns out that matrix multiplication can be used to perform any linear mathematical operation, and a whole lot of interesting things are linear. Geometrically speaking, scaling, rotation, and skewing are linear operations.
First let’s say we want to model multiplication of two complex numbers by matrices. First we need some complex numbers to multiply, and I happen to like $(3 + 4i) \cdot i$.
So that is pretty easy it is just polynomial multiplication, so we distribute $i$ onto both terms.
Yep, multiplying by $i$ is a rotation $90^{\circ}$ counter clockwise. So anyways the thing I think is cool, is how it can be represented as a matrix multiply instead of looking like a polynomial multiply.
The identiy matrix, the matrix that if you multiply by it, it is basically a no-op, why does it act like that, and why is it shaped the way it is? Say you have this matrix multiply:
If $c = 1$ then there is your identity operation. But did you ever think:
What are these rows and columns in a matrix really all about?
Say you view that $2x2$ matrix as two unit length column vectors sitting side by side.
The first column $\begin{bmatrix} 1 & 0 \end{bmatrix}^{T}$ and the second column $\begin{bmatrix} 0 & 1 \end{bmatrix}^{T}$ are exactly the basis vectors which define and span $\mathbb{R}^{2}$. Otherwise known as either the x-y or real-imaginary axis.
So we saw that multiplication by the identity matrix performs no operation at all, because there is just no change to the basis vectors for that space $\mathbb{R}^{2}$. We also saw if we perform a simple change of basis where we scale by $c$, it just scales everything by $c$. The diagonal numbers don’t have to be the same as each other either, if they were different you would get a skewing operation instead of a scaling operation.
Great, so what do the other numbers on the opposite diagonal that have always been $0$ up to this point do? Those numbers let you perform rotations.
If we take the two column matrices we are using as our basis, and rotate them counter clockwise by $90^{\circ}$, which should be easy because they are so simple, we should get the new basis we’re looking for.
You can see how the first column, instead of being a vector pointing horizontally, is now pointing vertically, and how the second column, which used to, by chance be pointing vertically is now pointing horizontally but in the negative direction, each vector is pointing $90^{\circ}$ counter clockwise to where it used to be pointing.
So what that means, is since we’ve rotated each component $90^{\circ}$ anything vector we multiply by $i$ will also rotate in the same way.
This should totally be true of a matrix that we decide to name $i$, that is $i^2 = -1$, or well it should equal the matrix version of $-1$.
Further proof that this makes any sense:
Just as the old “graph paper” example predicted. So, I am pretty happy with my explaination of this phenomenon. I have to admit I am just getting used to this Mathjax Latex formatting stuff.
Normally I am happy with a code example, so here is a Ruby example of the same thing:
1 2 3 4 5 6 7 8 9 10 11 |
|
We don’t alway want to rotate by $90^{\circ}$, but there is an equation that will let us create a matrix for any arbitrary rotation by $\omega$ radians. And that happens to look like this:
Let’s write a Ruby method to create rotation matrices for us, just by passing an angle in radians.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Now that we’ve established all that stuff about complex numbers and matrices works we sort of have a formula for represeenting a complex number as a matrix.
So what happens if we use some other sorta pattern besides that? What if we use some other dimension besides the imaginary dimension? What if that dimension was infinitesimal. Most people learn calculus by learning about limits first, but for some reason I didn’t, I learned on my own, and the first calculus textbook that made sense to me taught derivitives using infinitesimals, it’s really similar in some ways, but revolves around the number $\varepsilon$
Derivitives calculate the slope at one point on a curve, when it takes two actual points to calculate a slope, “Rise over Run” style. Usually you see a formula for derivitive using a variable $h$ or $\Delta x$, and take the limit as it goes to 0
…but in this “style” of calculus, instead of that, you use $\varepsilon$ where $\varepsilon^{2} = 0$. In math, they say that if there is some positive integer $n$ where $x^n = 0$ then you would call $x$ a nilpotent number
So, example:
And that all worked out correctly, because $2x$ is totally the derivitive of $x^{2}$. The key to that working out was that we defined $\varepsilon^{2} = 0$, which is something that totally reminds me of defining $i^{2} = -1$.
Like a complex number has the form $a + bi$, a dual number has the form $a + b\varepsilon$. The imaginary number $i$ has magic powers, in that it can magically do rotations that you would normally have to use trigonometry for, but a dual number, has the magic power that it can automatically calculate the derivitive of a function.
All you need to do to simultaneously calculate the value of a function, and its derivitive, is pass the function a value of $x + \varepsilon$, that’s whatever $x$ happens to be + $1\varepsilon$.
Simple example again.
The result of the function is another dual number, the real part of which is $x^{2}$, and the dual part is $2x$, which the value and derivitive at $f(x)$ and $f’(x)$ respectively.
So just like we can encode a complex number into a 2x2 matrix, we can also encode a dual number in a similar way.
So if we multiply $\varepsilon \cdot \varepsilon$, we should get 0.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
I dunno, thought that was pretty cool. Actually the more I figure out about matrices the more I am understanding how they can be used to implement any linear operator. Actually the reasoning behind me even caring about matrices or linear operators at all (or math really), is that works together well with my obsession for digital audio filters. Because a digital filter is usually a linear operation, yep, matrices can be used to calculate filters :) Hopefully I will get more into describing how filters work in future blog posts.
]]>Anyway, while researching for that project I found out that someone actually has created a perfectly viable synthesizer based around a 20MHz ATMega chip (as in the Arduino) called the Shruthi-1. I thought this was a perfect opportunity to go through the steps of actually building a proper synth of the sort I would someday like to design, so I ordered the Shruth-1 4-Pole Mission Kit, and decided to put it together from scratch.
Shruthies are only really offered as kits you put together, they’re about $180 Canadian, and come from France. If you want a real, working, and fun synth for not a lot of money and can use a soldering iron, it’s probably the way to go.
Olivier Gillet, the designer of the Shruthi, is pretty inspirational to me, but how did he design an awesome synth around a little 20MHz ATMega chip? First thing, is that the Shruth-1 4-Pole Mission is a hybrid digital-analog monosynth. The synth is divided into two PCBs, the top PCB is the digital control board, which contains the ATMega chip, LCD, knobs, and buttons. It is running software responsible for handling MIDI, generating the digital wavetable oscillators, envelopes, LFOs, reading the knobs. There are 20 pins connecting it to the bottom PCB that primarily contains the analog filter, but also has the power regulator, MIDI and audio inputs and outputs.
Having an analog filter is a common cause for a synth to be monophonic, ie. only one voice or note at a time. This is because in order to be polyphonic an analog synth would need separate circuitry for oscillator, envelope, and filter for each possible voice that could sound at once, making it much more expensive for each voice. There are paraphonic synthesizers which can assign a note to more than one oscillator and sound them simultaneously, but uses only one envelope and one filter. The Shruthi-1 does something similar with its duophony mode. Since it has two oscillators it allows you to play two notes at once, the first is assigned to Osc1, and the second to Osc2, however they are both enveloped and filtered together.
Shruthi-1 has two 8-bit digital oscillators, and either a square or triangle sub oscillator, or a click generator to create nice transients during a note’s attack. The main and sub oscillators have a vibrato setting, the main oscs are also able to be detuned separately, and have a PWM control.
Looking at the code, the Shruthi-1 generates its 8-bit oscillators in a few different ways, I can see code for rendering pure wavetables with interpolation, some code where it is waveshaping a static wavetable, some for interpolating two wavetables together, and some which outright generates a signal in real-time.
The sub osc or click/noise generator is sent straight to a digital mixer for summation, while the two main oscs
enter a modulator first. The modulator is where a significant amount of character is introduced in the synth.
The default algorithm in the modulator is a simple balance control between the two oscillators, but there are
many other algorithms such as:
Next in the signal chain, the output received in the digital mixer is converted to 10MHz/1-bit audio. This means the audio can be represented by a pulse width modulated square wave. A PWM waveform is literally only ever on or off, hence the 1-bit depth, but it can be used to represent any fractional voltage between 0GND and whatever is your high voltage. For example if you attach an LED to a PWM output and the square wave is at 100% duty cycle it will shine the brightest it can, but if you alternate between turning it on and off for exactly half the time, 50% duty cycle, it will shine half as bright. It is really turning on and off really fast, but that is imperceptable to us most of the time. Duty cycles from 0% to 100% can be created to represent the internal 8-bit digital values from 0-255, and in PWM form, the audio can be sent electrically to the filter board for processing.
The filter board on my Shruthi-1 is a 4-pole ladder filter, which is essentially four 1-pole filters connected in series. The current is delayed in phase by 90° by each 1-pole component, usually caused by the reactance of a capacitor in a simple RC filter, and I assume something similar must be going on here. After going through all four stages, the phase has gone around to 360°, and so is back where it started. Variably mixing its output back into the first pole’s input causes the sick resonance ladder filters are known for because the phase lines up again and reinforces frequencies in a loop at the filter’s cutoff frequency.
Other synths using laddder filters are Moogs and the TB-303, though they are constructed differently. The Shruthi also let’s you use the output at 1,2, or 3 poles which affects the slope of the stop-band. You can also use the audio input in the back of the synth to apply the filter to whatever external sounds you want, provided you hold the envelope open.
The kit was actually pretty easy to assemble, and I spent a few hours per day on it and had it done over a weekend. Since I don’t have one of those fancy stands with clips to hold the boards for me, it was important to solder all the components in order of height, which meant doing all the resistors first, etc. That way when you flip it over to solder the joint it will be held in place by resting on your work surface.
I chose to do the filter board first, as it seemed like the hardest to do, but it was quite easy. The digital board was a little tricky with its LCD screen probably being the hardest part out of the whole project. A found the case kind of sucky, mine looked like it was cut out of the end of a huge piece of plastic, which had ugly green print on some of it, luckily it was on the bottom, and it was missing a screw hole on the top face.
The 20 pins which connect the top and bottom boards are sort of haphhazardly connected, because they don’t reach so well, but they are firmly together, even if not perfect, and I haven’t had a problem with it.
So far, I’ve found that the Shruthi-1 makes one crazy analog sounding bassline, so here is an example of the track I wrote after I finished the build. :) I used the Shruthi-1 for bass, and a few other parts here and there.
In pure functional programming, all functions are supposed to be referentially transparent, and that means that each time you call a function with the same arguments, it should give you the exact same result. When functions are referentially transparent, you have a lot less worries about whether or not it will always work correctly.
A mathematical function is never going to give you a different answer no matter how many times you give it the same argument. The reason for that is pretty much that it cannot get any values from anywhere other than what you passed it, so it can never be any different.
In an imperative programming language you could write a sin(x)
function which
was completely evil and called time()
, getting a value from somewhere besides
the x
parameter. If the time in seconds was even, it would add 1 to the result
it returns, and if not it wouldn’t.
This example is just plain evil, especially if every time you happen to test
the sin()
function it happened to be an odd time in seconds, until one
important day a million astronauts burn to death in the depths of space because
it was run on an even second. Silly example but that is the nature of many
bugs in the imperative programming world.
All of these problems involve IO. If you say no functions can do any input or output to the OS, then the problem is solved, except you can also never interact with the program in any way.
The answer is to let some functions do IO, but do it inside a container called
the IO Monad from which you aren’t supposed to be able to escape. The reason
you aren’t able to escape, is because the data constructor for IO is hidden
from use, by hiding it in the IO Module. This means the type signature for
every function which does IO will be something like main :: IO ()
.
Any function which calls another function that does IO, getLine :: IO String
,
for example, must also return something wrapped in IO. It can’t deconstruct
the return value from getLine
into just a String
using the IO data
constructor and return that. It can pass the pure string to a pure function
though, by using bind.
Here is an example of doing IO to get a number to pass to the pure function sin.
1 2 3 4 5 |
|
This looks like imperative code, telling you which order to do things and
sharing the results of subsequent function calls. getLine
returns a type IO String
,
remember this is like Container String
from part 1.
The function putStrLn
always returns IO ()
, read IO null, and since it is the
last thing, that is returned from the entire function, as you would expect.
In reality, it is converted to this:
1 2 3 4 5 |
|
This is really one long expression, and not a recipe as it looks in do
notation. Since haskell is lazy, it probably does not do any computation until
it reaches the putStrLn
function, which I think is strict (evaluates it as soon as it sees it).
When putStrLn
evaluates its argument it finds valueStr
and finds that it
does’t have the value worked out yet. It sees that it comes passed in through
the lambda and that forces it to call getLine
, and the use enters their text.
Then it evaluates the let statement to find the result, has a complete
string, prints it out, and putStrln
returns IO () from the lambda, and
according to the definition of bind, also returns IO ()
from the bind
expression, and then the function getSin2
itself.
The Haskell list type is also an instance of the Monad
typeclass. There are four functions
defined for a monad, but you only need to implement 2 of them: >>=
(pronounced bind) and return
.
1 2 3 4 5 |
|
The easiest function is return
. It just wraps a value in the monad container, and it is exactly
the same thing as pure
from the Applicative class. The bind function takes a monad holding a
value of type a
, and a function which can change an a
into the same type of monad holding something of type b
.
Before making MyList
an instance of Monad, it might be easier to see what happens if we make
something simpler like Container
from part 1 an instance first.
1 2 3 4 5 6 |
|
The Container on the left hand side is a monad and holds a 3, the function on the right hand side of bind always accepts one argument. In this case the implementation I wrote for bind just passes the inner value of Container to the function.
The function we passed just adds 2 to the value and rewraps it using return. This implementation
is the least exciting thing that a monad can do, because Container
is now the identity monad.
The identity monad performs just simple function application, and doesn’t employ any computational
strategy. Making MyList
into a monad is significantly more amazing.
1 2 3 4 |
|
The above says that binding any function to an empty list just returns an empty list. In the case
that we have list items, the head and tail of the list are pulled apart, the bound function is
applied to the head of the list, and recursion is used to bind the function to the remaining
tail of the list. These results are appended together using mappend
from the Monoid
typeclass,
resulting in one list at the end.
Since the bound function f
must return a wrapped value, each item it returns is a list with one
item in it such as (Cons 1 Empty) or [1] where the item inside has been modified by the function.
Then it appends all these lists into one list.
1 2 3 4 5 6 |
|
This looks a lot like fmap, but the difference is that the function you apply has to return an already wrapped type, and so you can chain these together in an ever increasing closure, or enclosed scope inside lambdas. This means that each new lambda closure brings its argument within scope of all the rest.
1 2 3 4 5 |
|
Because of the recursion in bind, this is basically 3 nested loops, giving you every combination of the 3 lists. The same thing will work on MyList, except it will look ugly for lack of pretty printing.
This is the basis for list comprehensions in languages like in Erlang, Python, and Haskell itself. Haskell provides 2 types of syntactic sugar for this, do-expressions for monads, and list comprehensions for the list monad specifically.
1 2 |
|
In an imperative programming language you can write a list of things to do in a function like a recipe. When you make variables in a function scope they are available to everything within that scope, but writing Haskell is not like writing a todo list of what to do, and in what order, and carrying state from one todo item to another. It is like like declaring what something is in one expression, and it doesn’t really let you say what order anything should be evaluated. Haskell evaluates things in the order that it needs to know the value.
A monad can let you simulate sequence by nesting bind functions, because it will cause evaluation in the order that you nest the bound functions. It evaluates in a specific order because nesting makes each closure rely on the value of the previous outer closure.
It will also allow you to build scope that each function you bind can share, due to the nested lambda expressions.
The people who created Haskell made something called a do-expression that is syntactic sugar for monadic binding, and it sort of makes your code appear to be an imperative programming language with sequence and imperative style scope.
1 2 3 4 5 6 7 |
|
A Haskell list is also an Applicative Functor. If we want to make MyList
one too, we can
look at the interface for the Applicative class, and implement the right functions.
1 2 3 4 |
|
So, it looks as though we need to implement the functions pure
and <*>
in order to be an
instance of Applicative. This also says that whatever is Applictaive has the prerequisite of
also being a Functor. The function pure
must take any type a
and wrap it in the container MyList.
This is also called lifting a
into the Functor. Implementing pure
is easy enough, because it
is the same thing as our data constructor, Cons
.
The function <*>
has the type signature (<*>) :: f (a -> b) -> f a -> f b
and represents function
application for types that are wrapped in our data structure f
, where the function is also wrapped in
the same data structure f
.
Functions that represent applying functions is sort of weird concept, but I can think of three functions
offhand that do accomplish this in different ways. The three are: $
, <$>
, and <*>
, so let’s look
at their type signatures.
1 2 3 4 5 6 |
|
The $
function takes a function (a -> b)
and applies to an a
, not surprisingly
giving you a b
. This is regular function application like odd 3
returning
True. Here I show different ways to use $
, implicitly, explicitly, and infix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
People generally use $
when they want to change the precedence of an expression
without using a lot of parenthesis.
Look at the type signature of <$>
, and the one for fmap
.
1 2 3 4 |
|
The function <$>
is fmap
. It is applying a function to the Functor f a
producing f b
. The angle brackets around the $ are indicating that this is
application inside a container. This is called lifting the normal function (a -> b)
into the Functor. What if the function itself is wrapped inside a Functor container?
The answer is that function <*>
is used instead. Why wrap a function in a container to
apply it to some value in the same type of container? Why not just not have
anything in containers at all? Remember that a list is a container and check
this out:
1 2 3 4 5 6 |
|
Since lists are applicative functors, you may combine a list of functions to a list of
values and have it do the obvious thing, apply everything to everything, and then either mconcat
or mappend
the results into a single flat list. This just happens to be how <*>
is implemented for a Haskell
list, because it is really the only way that makes sense to implement it.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
The definition of <*>
is just building on fmap
and mappend
. Which is neat and shows how all
of these things are related. For a list, <*>
just maps each function over each item in the
list like a nested loop, and appends them together into one list.
It turns out that cons lists can be more than just a Functor, it can be a Monoid. A Monoid is a object with a single associative binary operation, and an identity element. This means that things like addition and multiplication form a monoid.
The identity element for addition is the number $0$, because $x + 0 = x$. An identity element and any other element, when operated on by the single associative binary operation, is one that does not change the other element. Basically you can add $0$ to any number and you just get the same number. The identity element for multiplication is $1$, because $x \cdot 1 = x$ for every number.
The binary operation should be one which can combine two of the objects, and for a list that happens to be
appending them using the function ++
.
[1, 2, 3] ++ [4, 5, 6] == [1, 2, 3, 4, 5, 6]
Easy enough, so that means the identity element, the
element you can combine with a list that will return the same list is: []
, the empty list. [1, 2] ++ [] == [1, 2]
The ghci command :info
shows that to be an instance of a monoid you must implement the functions
mempty
which returns the identity element, and either mappend
or mconcat
. Typeclasses can
sometimes have default implementations for some functions, and it’s often the case that two functions
are actually defined by default in terms of one another, meaning you only have to implement one of them
and the other will automatically work. Here mappend
and mconcat
are defined in terms of each other
so we just decide to implement the easier of the two, mappend
Looking at the type signatures below we can see mappend :: a -> a -> a
, where in our case a
will
be the type MyList
. This means mappend
receives two lists and returns a third in which they are
combined. For addition this would have been receiving two numbers that need to be combined, but for
a list it just means to stick them together end to end.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Having MyList be an instance of Monoid makes it easier to write the implementation for the
Applicative type-class, because mappend
is used in its implementation.
How can we modify or transform a value or values that are contained in a data structure such as Container
? Let’s
say we have a Container
holding the integer 4, and we want to add 1 to it. The problem is that
a Container
doesn’t have addition defined for it, and really, it shouldn’t considering that any possible
type could be stored inside it, any number of which have no meaningful way to respond to addition.
Let’s look at a similar situation in C++:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
The extreme generality of this C++ class means that it would be a mistake to define operator+
on
it, as any number of types T
also cannot meaningfully respond to addition. Now we find ourselves
in a similar situation in Haskell:
1 2 3 4 5 6 7 8 9 |
|
Haskell has an elegant solution to this problem called a Functor. Ghci is able to print out the full
interface for any typeclass with the :info
command so let’s see what interface is required for a
data type to become an instance of the Functor
typeclass:
1 2 3 |
|
This says there is a class named Functor
, its instances will be referred to as f
. Instances must define a
function called fmap
. The first parameter of fmap
, is (a -> b)
. This means the first parameter is a
function which accepts something of type a
, and returns something of a possibly different type b
.
The second parameter f a
is a functor (such as Container
) wrapping a type a
. The last thing in a type signature is the
return value. It returns f b
, which means it returns the same sort of datatype, (Such as Container
) wrapping something of type b
.
That can be somewhat difficult to follow at first, but what it is essentially asking you to implement is
a way to take the data out of the data structure, apply a transformation function to it, and then put it back
in the data structure. Let’s make Container
an instance of Functor
.
1 2 3 4 5 6 7 |
|
In the function fmap above, on the left hand side of the declaration, we are using Haskell’s pattern matching
feature to deconstruct the container and remove the value from it. When called fmap f (Container contents) =
binds the function (\x -> x + 1)
to f
, and in this case the integer 4
to contents
.
This is because during pattern matching, the data constructor function Container
acts to deconstruct the data
type into its components. Later we’ll see data types that contain more than one value and see that it can be
used to access any number of data members. Haskell is all about composite data structures and wrapping and
unwrapping the components inside them to do work.
On the right hand side of fmap
the Container
data constructor is again used to wrap up this value, but it
is first transformed by the function f
. We get the effect of being able to send any function inside the
container to be applied to its inner value.
You might be wondering, why wrap this integer in a data structure at all if it just makes it annoying to work on it? The answer is that unless you want to isolate a value or restrict operations that can be performed on it, you probably wouldn’t want to do this if it only held one value. This type of isolation is used in Haskell to separate functions that work with IO and side effects from pure functional code. The IO Monad hides its data constructor so that you cannot create anything of type IO in ‘pure’ code, and you can’t deconstruct an IO and get its values out. This causes you to always need to work with ‘impure’ IO stuff by sending functions into or declaring them inside the IO container, and also serves as a marker for impurity.
The Maybe Monad is another type that can still do something interesting while wrapping only one value, but let’s see an example of a functor working on multiple values first, a list!
The native list type in Haskell is a linked list, and it is also a functor. Let’s reimplement it from scratch so that we can see how it works. In functional languages a linked list is often called a Cons List, and is a recursive data structure formed by cons cells in which each cell contains two elements, the first element is called the head or car, and is one value in the list, and the second is called the tail or cdr (pronounced kooder) and is another Cons Cell, which in turn holds one value and another cons cell and so on.
The terms cons, car, and cdr come from Lisp, and they are the three main functions used in that language to work
with lists. Cons constructs a list, car returns the head of the list, and cdr returns the tail of the list.
In haskell, the car and cdr functions for lists are actually just named head
and tail
, but I am using
the Lisp-named version here to avoid us getting confused by head
and tail
already defined in the ghci Prelude’s
namespace.
1 2 3 4 5 6 |
|
Above we define a type MyList
to hold a parameterized “variable” type a
. This time you can see that there
are two data constructors, and that they don’t have to have the same name as the type itself as we chose in
the definition of Container
. Empty is the data constructor for constructing an empty list, and Cons is
a data constructor of two arguments. This could have been written data MyList a = Empty | Cons a (MyList a) deriving(Show)
but what we’ve used here is called record syntax.
You can see the first argument to Cons is something of type a
, and the second is of type MyList a
, record
syntax gives names to each argument, and also provides accessor functions by the same name to get at
each data member. This is a recursive data structure because MyList
itelf is used on the right hand side
of the definition. Let’s play with this for a moment, and construct some lists.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
A similar C++ class might look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Haskell itself has a builtin list data type with some syntactic sugar. The haskell
equivalent of the Cons data constructor is :
, the Empty data constructor is []
,
while car
and cdr
are head
and tail
. Haskell lists also have a pretty show function
defined for them.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
So MyList
is now a perfectly functioning linked list, but one thing it is missing over the
builtin list type is the fact that it is not yet an instance of Functor
, but we can fix that.
Before, with Container
the idea behind fmap
was to unwrap a value, apply a given function to
it and wrap it back up again, and here the only difference is that there are two data members in a cons
cell; the value, and another cons cell. The recursive nature of a cons list gives us a clue that
fmap
will also be recursive. We’re going to apply the given function to the value in the cell,
and then call fmap again on the cons cell containing the rest of the list so that each of its values
will be transformed as well.
1 2 3 |
|
Here fmap
is defined twice. Again we can see data constructors on the left hand side being used
to deconstruct our list for pattern matching. If we try to fmap an empty list the first declaration
is matched and chosen, the function argument itself is thrown away and and Empty list is constructed
and returned. Fmapping an empty list is an empty list.
If a cons cell is matched, the data constructor Cons
is used to deconstruct the cell and bind its
two values to x
and xs
. xs
as in the plural of x
What this does is separates the head of the
list from the rest of it, you perform an operation on the head and send the rest for processing by
recalling the function recursively. This is a common pattern you’ll see when writing recursive functions.
The base case is encountering the Empty cell at the end of the list, which stops the recursion. The function only knows how to handle one element at a time, and relegates the rest of the work to itself during future calls, and so on until it meets the base case and stops.
1 2 3 4 5 6 |
|
You’ve probably seen this before in other non-functional languages since many parts of the
functional paradigm are being adopted in imperative languages all the time. Ruby and Python
both define map
on their list types. Ruby passes a code block into its map
method and
that looks like this:
1 2 3 4 |
|
In a language like C, you can think of data structures as containers which hold one or more objects.
1 2 3 |
|
The int_container_t
type is now something which holds one int. To make the same sort of data structure in Haskell you would
write:
1
|
|
Ignoring for a moment the suffix deriving Show
, this looks a lot like a function declaration, because that is essentially
what it is. The identifier IntContainer
appears on both sides, but the two are actually in two different namespaces. It is
not necessary for these two identifiers to have the same name, but people will often do this by convention.
The left hand side of this declaration names a new type IntContainer
, and the right hand side is defining a data constructor
for this type, essentially a function named IntContainer
which takes one Int
as argument which it uses to create an instance of this type.
1 2 3 4 5 6 7 8 9 10 |
|
If I pass the data constructor a 4
, it returns something which is of type IntContainer. The ghci command :t
can be used
to get type information about just about anything in Haskell. IntContainer :: Int -> IntContainer
is read “IntContainer is
a function which accepts an Int and returns an IntContainer”. i :: IntContainer
is read “i is of type IntContainer”.
In C++, we have a template system which allows us to create classes and methods where the types involved are variable. If we wanted to create a Container class in C++ which could hold not only an int but any type, we could write a class like this:
1 2 3 4 5 6 7 |
|
Haskell has an algebraic type system which enables us to do what C++ does with its template system, but is far easier to use and debug. Much of the simplicity comes from the lack of pointers in Haskell, and the fact that compound types and functions are treated the same as primitives without needing to put much thought into copy constructors and memory management.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
This time, the left hand side of the type declaration is parameterized with a variable type t
, whereas
in IntContainer
it did not. A type variable like t
is a placeholder for a type, so on the right hand
side t
can be replaced by any type, for example when you pass the data constructor a String
, it is as
if you had declared data Container = Container String deriving Show
. In all, so far this is very much
like C++ templates.
In the above example it is shown this works with Int
, Float
, String
, or even [Int]
(a list of Ints).
It will even accept another function, as shown as I pass it the lambda (\x -> x + 1)
, because a function
is a typed value just like anything else. This creates a value with the type signature x :: Container (Integer -> Integer)
meaning “x is a container holding a function which accepts an Integer and returns an Integer”.
Everything works fine until I type x
and ghci’s read eval print loop tries to print a the function (x -> x + 1)
.
The other data types like Int
and String
have properly defined Show
functions already, but a lambda
does not have any meaningful way to display itself. I have told Container to derive its show function
where it is possible, but it was not possible when passed a lambda function, therefore we got an error.
A class in Haskell is not quite the same as in object oriented programming languages. To define a class you essentially define only an interface, and it is not necessary to provide any implementation. Any type can be a member of a class if it declares that it supports the class’s interface.
Let’s define a funny example class:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now Container is an instance of Show and PikachuShow.
Container
derives its show
function from its data members such as String
or Integer
which are
instances of Show
. (Integer -> Integer)
is not an instance of Show
and container cannot find a show
implementation for it. Container
is an instance of PikachuShow
and has therefore implemented the pikaShow function
The pikaShow
function ignores its argument, because of the _ in the function argument list, this is also
seen in languages such as Erlang, and to some extent in Ruby. The argument, if you had bound it to a variable,
would be the Container
it was called with.
The most common method used in modern times is called Equal Temperment. In this tuning, each adjacent note is related by the ratio of a 12th root of 2 or:
Starting from A 440hz
and calculating each of the 12 notes of the chromatic scale up to the
next A, looks like this:
1 2 3 4 5 6 7 |
|
This tuning has the benefit of being able to switch musical keys without retuning your instrument, and allowing different types of instruments to play together. This is the way a guitar, MIDI synthesizer, or piano is usually tuned. The problems with it are that essentially every note frequency besides the Octave is slightly wrong, and when calculated this way even that is a bit wrong due to floating point error and cumulative error of multiplying irrational numbers.
The benefits usually outweigh its problems, but for the purposes of this article I will be using…
The ratios in Just Intonation are not all equal, but are based on the harmonic series. You find each ratio by mutiplying the root note or tonic by increasing whole numbers, and then dividing by a denominator that will bring the frequency back into the octave’s range. Here is a data structure we can use to look up the ratios and names for each degree in the chromatic scale.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
This is already producing rational frequencies, and they are numbers I can work with mathematically
in this example. So, what is going on when an A Major chord A C♯ E
sounds consonant and pleasing to
people, and what causes a disonanant sound?
A broad answer, is that the smaller the whole numbers are which are involved in the ratio, the more pleasing
(or even boring) two notes will sound in relation to one another. You can see for yourself that the ratio
of a relatively dissonant interval like the Diminished Fifth, also called The Tritone or Devil’s note,
has much higher whole numbers in the numerator and denominator with 45:32
.
This interval is so interesting sounding, hanging out on the verge of consonance and dissonance that it is also called the Blue Note, and plays a large part in the sound of Blues, Jazz, Rock, and Metal.
While I was playing with the math described on the excellent DSP website A Trip on the Complex Plane, I started playing with dot products and it was mentioned that a pure sine wave at frequency $f$ is orthogonal to a sine wave one octave higher at $2f$. I began to wonder what a dot product reveals about the orthogonality of other inervals besides the octave.
One specific example of orthogonality that is easy to understand is on a 2D plane. Visually you can see it as two points that are rotated $90^{\circ}$or equivilenly $\pi/2$ radians from one another. Like these points at $(0, 0.5)$ and $(0.5, 0)$.
Here you can see points
It’s easy to see in this image, but you can figure out if any two points are orthogonal using a dot product. This is because the dot product of two vectors is equal to the cosine of the angle between them. The cosine of $90^{\circ}$, or from now on using radians, $\pi/2$, is $0$, therefore if the dot product of two vectors is $0$ they are orthogonal.
We can extend Ruby’s Array class to add some methods for working with orthogonality and dot products. There is already a Vector class in Ruby which does this, but it will be easier to show what’s happening by adding methods to Array and using it as a vector.
The formula for dot product is to multiply each element in the vector element-wise producing a third vector, then all elements of this resulting vector are summed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
So it’s great to be able to calculate if 2D points are orthogonal and all, but what about audio and musical
notes? How can we prove that an A 440hz
note is orthogonal to A 880hz
, and what about the other notes in
the chromatic scale?
A pure note with no overtones being played is just air pressure oscillating back and forth at a specific number
of times per second, or frequency. That can be modeled using a sinusoidal function like sine or cosine. Generating
a cosine wave to represent A 440hz
can be done with the equation $cos(2\pi \cdot 440 \cdot t)$ where $t$ is time in seconds.
In the computer we can represent this digitally by sampling the values that come out of the above equation at regular time intervals, we can specify a ratio, number of cycles to generate, and a rate at which to sample the cosine function with the following code. This effectively makes our digital sampling the same as an N-dimentional vector, similar to the 2 dimentional vectors shown above.
1 2 3 4 5 6 7 |
|
Instead of using 440hz
as the tonic, this code is just using 1hz
to simplify things. It is also dropping
the last sample because that sample actually belongs to the beginning of the next period of the cosine.
Now we should be able to prove what we already knew, two cosine waves that are in phase but an octave higher (double
the frequency) are orthogonal to each other.
1 2 3 4 5 6 7 8 9 10 |
|
Above I am genrating 1 cycle of a 1hz
and a 2hz
cosine wave at a sample rate of 5 samples per second. These results
can sometimes be hard to see due to floating point error, but we can unwind the dot product method and show it working
manually.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The above floating point error is the reasoning behind self.dot_product(other).abs < Epsilon
in the method is_orthogonal?
Epsilon is just set to some very small number to deal with floting point comparisons.
Using all this, we should be able to answer the original question, are the other intervals in the chromatic scale all orthogonal to each other? The answer I found was no, not after only one cycle of the cosine waves. But, if you keep them running together for longer periods of time and more cycles, there is eventually a time when the waveform’s intervals will match up in period and the full waveform up until that point will be completely orthogonal to each other.
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Below is a chart of the results sorted by how many cycles it takes for each interval to sync up and become orthogonal, which they all do after few relatively few cycles.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
The first things I notice about the results are:
What I can take away from this experiment, is that the human brain is basing its perception of consonance and dissonance on how long two frequencies or notes played together take until they match up in period and become orthogonal. This probably gives those intervals a feeling both of balance, but also temporary dissonance which resolves after a short period of time. Two sounds played at once which never resolve to orthogonality are considered noisey or out of tune.
]]>