Brainfuck is an esoteric programming language created by Urban Muller. The motive was to implement the smallest possible compiler for it. The language is Turing Complete, meaning that it's capable to do any computation other languages do.
It's a perfect language to demonstrate interpreters and compilers to novice programmers, without getting into too much details about lexers and parsers. Hereby, i will present my own implementation of a simple interpreter and a x86 compiler which generates Win32 PE Executables.
I will show the interpret
and compile
routines in the related sections, but you can download the complete source code here.
I'd recommend you to quickly review the commands before we go any further.
Interpreters are fundamentally different than compilers. Compilers generate code for another machine to execute. Interpreters however, have to implement all the functionality of the programming language. If the language is Turing Complete, the interpreter has to be Turing Complete, but a Turing Incomplete compiler can compile a Turing complete language.
Let's see the interpret
routine.
void interpret() { char *pc = mem; /* code 0-5k */ char *ptr = mem + 5000; /* data 5-9k */ char **sp = (char **)(mem + 9000); /* stack 9-10k */ while (*pc) { switch (*pc++) { case '>': ptr++; break; case '<': ptr--; break; case '+': ++*ptr; break; case '-': --*ptr; break; case '.': putchar(*ptr); break; case ',': *ptr = getchar(); break; case '[': if (*ptr == 0) { int c, depth = 0; while ((c = *pc++)) { if (c == '[') { depth++; } else if (c == ']') { if (depth-- == 0) { break; } } } } else { *sp++ = pc - 1; } break; case ']': pc = *--sp; break; } } }
We divided 10 Kb of memory layout into three sections. First 5 Kb is the code section, contains the actual program to be executed. The following 4 Kb is the data section that interpreter operates on during the execution. The last 1 Kb is used by the interpreter as a stack. I will explain it in a moment.
The first six commands are fairly straightforward. Conditional command '['
indicates that if the value at data pointer is zero, meaning the condition is false, jump the matching ']'
, otherwise continue the execution (analogous to while loop).
If the condition is true we push the address of '['
to the stack. Thus, there is always the address of latest loop in the stack. Whenever we encounter a ']'
, we pop the address of '['
and jump it by setting the program counter (pc)
to this address.
Notice that we always make the condition check at '['
code, instead of both '['
and ']'
codes as the language definition says. This avoids duplicate code, but in return, even if the condition was false, we jump to the start of loop for checking it, and pass through the loop block again unnecessarily.
We pass through the loop body by using depth
variable, which keeps the number of nested loops. For '['
, we increment depth
and for ']'
we decrement it. When the depth
is zero, it means we've found the matching ']'
.
I used the interpreter code as a base for the compiler. But some knowledge of x86 assembly language is required to understand it. Although it's not a mandatory part of the compiler, some knowledge of Win32 PE structure would be helpful to understand program as a whole.
Let's see the compile routine first.
void compile() { int pc = 0; char *codp = mem; int *sp = stack; uint32_t data = 0x400000 + 0x1000 + nimport; /* mov ebp, .data */ code[pc++] = 0xBD; *(uint32_t *)(code + pc) = data; pc += 4; while (*codp) { switch (*codp++) { case '>': code[pc++] = 0x45; /* inc ebp */ break; case '<': code[pc++] = 0x4D; /* dec ebp */ break; case '+': memcpy(code + pc, "\xFE\x45\x00", 3); /* inc byte[ebp] */ pc += 3; break; case '-': memcpy(code + pc, "\xFE\x4D\x00", 3); /* dec byte[ebp] */ pc += 3; break; case '.': memcpy(code + pc, "\x0F\xB6\x45\x00", 4); /* movzx eax, byte[ebp] */ pc += 4; memcpy(code + pc, "\x50", 1); /* push eax */ pc++; memcpy(code + pc, "\xFF\x15", 2); /* call putchar */ pc += 2; *(uint32_t *)(code + pc) = imp_putchar; pc += 4; memcpy(code + pc, "\x83\xC4\x04", 3); /* add esp, 4 */ pc += 3; break; case ',': memcpy(code + pc, "\xFF\x15", 2); /* call getchar */ pc += 2; *(uint32_t *)(code + pc) = imp_getchar; pc += 4; memcpy(code + pc, "\x88\x45\x00", 3); /* mov byte[ebp], al */ pc += 3; break; case '[': *sp++ = pc; memcpy(code + pc, "\x0F\xB6\x45\x00", 4); /* movzx eax, byte[ebp]*/ pc += 4; memcpy(code + pc, "\x85\xC0", 2); /* test eax, eax */ pc += 2; memcpy(code + pc, "\x0F\x84\x00\x00\x00\x00", 6); /* jz endwhile */ pc += 6; break; case ']': { int bpc = *--sp; *(uint32_t *)(code + bpc + 8) = pc - bpc - 7; code[pc] = 0xE9; /* jmp 'location of [ code' */ *(uint32_t *)(code + pc + 1) = (uint32_t)(bpc - pc - 5); pc += 5; } break; } } memcpy(code + pc, "\x6A\x00", 2); /* push 0 */ pc += 2; memcpy(code + pc, "\xFF\x15", 2); /* call exit */ pc += 2; *(uint32_t *)(code + pc) = imp_exit; pc += 4; ncode = pc; }
The code
variable is the buffer we write x86 machine code as we compile the program. The pc
variable is the program counter, which holds the current position at the code buffer. For instance, the line, code[pc++] = 0x45;
means, write 0x45 machine code (inc ebp)
to the code buffer, and increment the position one byte.
While the executable runs, it uses ebp
register as the data pointer, hence the first thing we do is setting it to the address of data section. Since it's the first section in the executable, address of data section is static and 0x401000
. In the beginning, we skip nimport
bytes from the data section, it's the import table which resides at the beginning of data section.
There is not much to say about compilation process, comments show the machine code we generate for every command. We use a stack to keep program counter value when a loop starts, just like in the interpreter. At the end of the loop, we jump to that address. We use E9 xx xx xx xx
jump code regardless of how far we will jump for the sake of simplicity.
Notice that in the machine code for jz endwhile
, we write 4 bytes of zeroes as the jump distance, because we don't know when the loop will end at this moment. When the loop ends however, we know the exact distance because we pop the start address of the loop. We use this start address to overwrite that zeroes with the real distance and to jump the beginning of the loop again.
There are three functions we generate function call code for, getchar
, putchar
and exit
. Addresses of these functions are at imp_getchar
, imp_putchar
, imp_exit
and their values are known before the compilation starts, there is a nice hack here i'll point out in a moment. Here's the code that creates import table.
/* * Since the necessary functions are predefined, we can construct * import table before compiling program. * Import table resides at the beginning of first section, * so addresses are static and don't depend on code sections size. */ void create_imports() { /* * import desc. = 20 bytes. end marker = 20 bytes. * thunk array = 16 bytes (3 imports + end marker) * no org. first thunk. we can't be bound, big deal! */ memcpy(import + 56, "msvcrt.dll\0", 11); memcpy(import + 67, "\0\0exit\0", 7); memcpy(import + 74, "\0\0getchar\0", 10); memcpy(import + 84, "\0\0putchar\0", 10); nimport = 94; /* name rva */ *(uint32_t *)(import + 12) = 0x1000 + 56; /* first thunk */ *(uint32_t *)(import + 16) = 0x1000 + 40; /* thunk array */ *(uint32_t *)(import + 40) = 0x1000 + 67; *(uint32_t *)(import + 44) = 0x1000 + 74; *(uint32_t *)(import + 48) = 0x1000 + 84; /* this addresses will be used during compilation */ imp_exit = 0x400000 + 0x1000 + 40; imp_getchar = 0x400000 + 0x1000 + 44; imp_putchar = 0x400000 + 0x1000 + 48; }
If the code section was the first section like in many executables, address of data section would depend on the code size. We set data section as the first section, and put the import table at the beginning of it. This enables us to have the addresses of imported functions, before the compilation starts. Therefore, we create the final machine code in a single pass. Note that if we didn't have a constant number of external function calls, we wouldn't be able to create import table before compilation.
Rest of the compiler code is for generating a Win32 PE executable, which is not the main point of the compiler. You shouldn't worry about it.
There are some very nice brainfuck programs written by other programmers. Here is my favorite, a self-interpreter written by daniel b cristofani.
>>>+[[-]>>[-]++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++++++<<-]+>>>,<++[[>[ ->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<< ]<]<[[<]>[[>]>>[>>]+[<<]<[<]<+>>-]>[>]+[->>]<<<<[[<<]<[<]+<<[+>+<<-[>-->+<<-[> +<[>>+<<-]]]>[<+>-]<]++>>-->[>]>>[>>]]<<[>>+<[[<]<]>[[<<]<[<]+[-<+>>-[<<+>++>- [<->[<<+>>-]]]<[>+<-]>]>[>]>]>[>>]>>]<<[>>+>>+>>]<<[->>>>>>>>]<<[>.>>>>>>>]<<[ >->>>>>]<<[>,>>>]<<[>+>]<<[+<<]<]
To use it, copy and paste a brainfuck program, put a ! and enter the input.
This file contains the original implementation with some example programs. I particularly like the prime number program. Esolang.org is a very good reference for brainfuck and many other esoteric languages, there are also links to other references, you will find everything you need there.