PL/0 Compiler Written in Go

PL/0 Compiler Written in Go

Dogan Kurt
[email protected]
August 2017
  1. Introduction
  2. The Parser
  3. The Code Generator
  4. Win32 Exe Generator
  5. Examples
  6. More About PL/0

Introduction

PL/0 is a nice little programming language created by the grand master—Niklaus Wirth. It's very small and easy to write a compiler for, which was the main purpose.

To learn Go better, i wanted to write a fun and mildly challenging project in it. Here is a PL/0 compiler. It's written from scratch without any parser generator, assembler, linker and so on. It produces x86 Windows executable in a single pass. Even better, it's only ~700 lines of code.

I strictly followed the EBNF syntax found in Wikipedia and other locations. You can download the source package with some examples here. If you don't have a Go compiler, you can download the binary package. I will explain some of the implementation details, so let's begin.


The Parser

The parser is a recursive descent and it's the largest part of the compiler. I used to parse expressions with shunting yard algorithm which is an Operator-precedence parser, and use recursive descent parsers only for parsing blocks and statements. This time, i used recursive descent parser to parse even the smallest constructs.

Here you can find a simplified version of parser that parses only expressions. rdp.go translates infix notation to reverse polish notation while rdp_eval.go evaluates the expression.

As we parse the source, we also generate the code. Another approach is to create an Abstract syntax tree and generate code from it. I used the former approach because i was reading Wirth's Compiler Construction book and i followed it as much as i can. It's one of the best reads on the subject.


The Code Generator

Code generation is usually the most difficult part for me. Sheer assembly knowledge is surely not enough. However i found a very easy method in Wirth's book. The method emulates a stack machine with general purpose registers. There is a global pointer that points to next available register. Some constructs allocate a new register and increase the pointer while others free one and decrease the pointer. Let's see two examples.

stack -> eax, ebx, ecx, edx, esi, edi, ebp

a := 3
	mov	eax, 3    ; increase pointer
	mov	[a], eax  ; decrease pointer

a := 3 + 5 * 4
	mov	eax, 3		; increase pointer
	mov	ebx, 5		; increase pointer
	mov	ecx, 4		; increase pointer
	imul	ebx, ecx	; decrease pointer
	add	eax, ebx	; decrease pointer
	mov	[a], eax	; decrease pointer

Notice that when an expression finishes, the stack pointer returns back to where it was. In practice, stack depth is usually two, rarely three. We have seven general purpose registers. We don't use esp because we need it for local variables and procedure calls.

I would like to talk about branches a little bit. When we compile an expression, we know everything we need in advance. When we compile a branch statement however, like an if or while, we don't know how long should we jump. That's because we didn't compile the statements body yet. Therefore, we actually need another pass to write jump distances after the compilation ends. This is usually called fixup and we do exactly that.


Win32 Exe Generator

This part can be considered as a linker. We create a minimal PE header, we fill necessary information and we create an import table with three functions, printf, scanf, and exit. Finally we dump these to a file along with our compiled code and it becomes a windows executable.

This part has nothing to do with the compilers and it's even boring. But i don't like to have dependencies to other tools, and since it's a rather simple thing to implement, i do it myself.

There is only one section in our exe. It's readable, writable and executable. The section begins with the import table. This is unusual but feasible because we only have three external functions so the size of import table is known prior to compilation. As a result, we know and use the addresses of printf, scanf, and exit as we compile.

Global variables follow the import table. Local variables are created in stack on the fly, but global variables must be permanent. Finally our compiled code comes after the global variables. Code and data comes later because their size can change. Import table is fixed size.


Examples

There are five examples in the package. I wrote four of them.

fact.pas  : prints factorials of [1..10]
fib.pas	  : prints first ten fibonacci numbers
prime.pas : prints prime numbers up to 100
sqrt.pas  : reads a number and prints it's square root

Other example is written by Wirth. For the square root, i used Newton's method. I multiply the input by 4 and divide the result by 2. This is necessary for square roots of small numbers because we don't have floating point numbers. Since we don't have a mod operator, i used arg / i * i = arg for control instead of arg % i = 0 in the prime numbers example. I also used a funny break method.


More About PL/0

Here i replicate the EBNF syntax of the language;

program = block "." .

block = [ "const" ident "=" number {"," ident "=" number} ";"]
        [ "var" ident {"," ident} ";"]
        { "procedure" ident ";" block ";" } statement .

statement = [ ident ":=" expression | "call" ident
              | "?" ident | "!" expression
              | "begin" statement {";" statement } "end"
              | "if" condition "then" statement
              | "while" condition "do" statement ].

condition = "odd" expression |
            expression ("="|"#"|"<"|"<="|">"|">=") expression .

expression = [ "+"|"-"] term { ("+"|"-") term}.

term = factor {("*"|"/") factor}.

factor = ident | number | "(" expression ")".

It's a very simple syntax and the EBNF—another invention of Wirth's, makes it a lot more legible. As you can see, "?" and "!" statements are used for input/output in the syntax. I changed them with read/write statements, which is less cryptic.