Writing a C Compiler
This is the webpage for the book Writing a C Compiler. It includes links to the book’s companion code, errata, updates, and other resources.
If something isn’t working, or if you have a question, suggestion or correction, please email me!
Table of contents:
- Companion Code
- Outside Resources
- Updates
- Errata
- Tips on Extra Credit Features
- Setup Tips
- hello_debugger.s (for Appendix A)
- Acknowledgements
Companion Code
This book has two companion repos:
- The test suite. Use this to test your your implementation at each stage of the project.
- The reference implementation, NQCC2. You don’t need to refer to this, but it’s there if you want to look at it.
Outside Resources
You’ll need these to implement the extra credit features, or otherwise extend your compiler beyond what the book covers:
- The C Standard specifies how C programs behave, in theory. This book uses C17, the latest revision at the time of writing. Purchase the official standard, or use the similar, freely available draft.
- The System V x86-64 ABI documents calling conventions and similar details for Unix-like x86-64 systems. Find the latest version here.
- The Intel 64 Software Developer’s Manual documents the x86-64 instruction set. Download Volume 2 of the official manual or browse an unofficial online version.
- Compiler Explorer is an awesome website where you can put in source code and see how a bunch of different compilers translate it to assembly.
Updates
If changes to GCC, Clang, or other tooling affect this project, I’ll cover them here. (Updates to GCC/Clang behaviors that the book discusses in passing but don’t affect the project itself are covered in the Errata section below.)
None yet!
Errata
-
Chapter 6, p. 121. In the first sentence of the Parsing Conditional Expressions subsection, the
? :
operator is backwards. This sentence should read:The conditional
? :
operator is a ternary operator… -
Chapter 9, p. 180. On the last line of Listing 9-21, the call to
typecheck_block
is missing thesymbols
argument. This line should be:typecheck_block(decl.body, symbols)
Thanks to Ian for catching this!
-
Chapter 13, p. 335. The assembly code given in Table 13-3 to convert from
double
tounsigned long
includes an unnecessary instruction. As the last step in the conversion, this code copies the immediate9223372036854775808
into a register, then adds it to the destination:Mov(Quadword, Imm(9223372036854775808), Reg(<R>)) Binary(Add, Quadword, Reg(<R>), dst)
This step takes two instructions because the source operand of
add
can’t be bigger thanINT_MAX
. However, it’s better to generate a single invalidadd
instruction at this stage and let the instruction fix-up pass rewrite it. That is, instead of those two instructions, we can just generate:Binary(Add, Quadword, Imm(9223372036854775808), dst)
This is better because it uses one of our scratch registers (
R10
) instead of a non-scratch register, which will reduce register pressure once we implement register allocation in Chapter 20.Thanks to Yoshi Sono for catching this!
-
Chapter 14, p. 351. The “Null Pointers and Type Conversions” section on page 351 presents three examples of illegal implicit type versions and says:
GCC warns about the implicit conversions in the previous three code snippets, but it still compiles them.
GCC 14.1 and later treats all of these examples as errors by default. Hooray!
-
Chapter 20, p. 665-666. The
coalesce
function, discussed on pages 665-666, needs to handle an additional edge case if you completed Part II. You should not coalesce two pseudoregisters of different sizes (e.g. don’t coalesce aLongword
into aQuadword
or vice versa). If we coalesce a larger pseudoregister into a smaller one, and we end up spilling the coalesced register, we won’t allocate enough stack space for it, resulting in a potential miscompilation. Normally, the Briggs test should prevent the coalesced register from spilling, but the “ugly detail about the George test” discussed on page 661 means this isn’t guaranteed. Besides, as that discussion on page 661 points out, the Briggs and George tests are performance heuristics and we shouldn’t rely on them for correctness.Alternatively, you could coalesce pseudoregisters of different sizes but make sure to coalesce the smaller one into the larger one; however, that’s more fragile, so I don’t recommend it.
Tips on Extra Credit Features
Here are a few pointers about how to implement the extra credit features so you won’t have to refactor them later in the book.
-
Chapter 3: Bitwise operators. This feature requires one-byte register aliases (e.g.
%cl
instead of%ecx
). Use the same assembly AST node to represent one-byte and four-byte register aliases; for example, you should useCX
to represent both%ecx
and%cl
. In the code emission pass, decide which alias to emit based on context. (For now, one-byte register aliases only appear as the first operand of bitwise shift instructions.) Thanks to Tobin Yehle for suggesting this pointer. -
Chapter 5: Compound assignment and increment/decrement operators. In compound assignment expressions like
lval += rval
,lval
must be evaluated only once. Don’t rewrite it aslval = lval + rval
; this will lead to incorrect behavior for expressions likearray[f()] += x
in Part II. The same goes for prefix and postfix expressions; don’t rewrite++op
asop = op + 1
. -
Chapter 11: Bitwise operations with
shl
andshr
. Theshl
andshr
assembly instructions perform unsigned bit-shifting operations. If you’re implementing bitwise operators for extra credit, you’ll need these instructions starting in Chapter 11, but the main text of the book introduces them in later chapters. You may want to give them different names in the assembly AST than the book does.Each bit shift instruction (including
shl
,shr
,sal
, andsar
) has two forms. The two-operand form, which you presumably learned about in Chapter 3, takes a shift count as its source operand. The one-operand form shifts its destination operand by one bit.The book introduces the one-operand form of
shr
asShr
in Chapter 13, and the two-operand forms of both instructions asShl
andShrTwoOp
in Chapter 18. You only need the two-operand forms to implement the bitwise extra credit feature.
Setup Tips
Please email me if you run into setup problems that aren’t addressed here.
Validating Your Setup
The test script includes a --check-setup
option to validate that your system meets the project requirements:
$ git clone https://github.com/nlsandler/writing-a-c-compiler-tests.git
$ cd writing-a-c-compiler-tests
$ ./test_compiler --check-setup
All system requirements met!
Installing Python
You’ll need Python 3.8 or later to run the test suite. To see which Python version (if any) is already installed, run:
$ python3 --version
If you don’t already have a recent-enough Python installed, follow the platform-specific installation instructions below.
macOS
Install the latest version of Python with Homebrew:
- Install Homebrew if needed:
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install Python:
brew install python3
If you don’t want to use Homebrew, you can use the official Python installer instead:
- Go to https://www.python.org/downloads/
- Click the download button.
- Open the downloaded package and follow the prompts.
Linux
Use pyenv to install a recent version version of Python:
- Install pyenv:
$ curl https://pyenv.run | bash
- Follow these instructions to set up your shell environment.
- Install the recommended Python build dependencies for your operating system.
- Install a recent Python version, e.g.
$ pyenv install 3.12
- Activate this Python version whenever you’re in the
writing-a-c-compiler-tests
directory:$ cd /path/to/writing-a-c-compiler-tests/ $ pyenv local 3.12
Now when you run the ./test_compiler
script from inside this directory, it will be invoked with this version of Python.
hello_debugger.s (for Appendix A)
Download the example program for the debugging tutorial in Appendix A.
Acknowledgements
After I wrote the acknowledgements, I got help with the book from even more people. Maybe I can add them to the acknowledgements page in the next printing, but until then I’ll thank them here.
Thanks to everyone who tried out the test suite: Tobin Yehle, John V. Tan, Yoshi Sono, Norberto Ortigoza, Adrien Lamarque, Blaise Watson, Rory Sawyer, Moritz Neeb, Raunak Ramakrishnan, Alan Lin, Sam van Gool, Sara Venkatesh, Marcin Morawski, Ori Bernstein, Dibri Nsofor, Andrew Nichols, Vedang Manerikar, Janice Shiu, John Murphy, Anvay Grover, Geoffrey Gilmore, and Maja Kądziołka. Their feedback made the test suite better and more usable.
Thanks to Karen Sandler for helping me check that the alt text for the figures was clear and accurate.