This is the webpage for the book Writing a C Compiler. It includes links to the book’s companion code, errata, updates, and other resources.

If something isn’t working, or if you have a question, suggestion or correction, please email me!

Table of contents:

Companion Code

This book has two companion repos:

  • The test suite. Use this to test your your implementation at each stage of the project.
  • The reference implementation, NQCC2. You don’t need to refer to this, but it’s there if you want to look at it.

Outside Resources

You’ll need these to implement the extra credit features, or otherwise extend your compiler beyond what the book covers:

  • The C Standard specifies how C programs behave, in theory. This book uses C17, the latest revision at the time of writing. Purchase the official standard, or use the similar, freely available draft.
  • The System V x86-64 ABI documents calling conventions and similar details for Unix-like x86-64 systems. Find the latest version here.
  • The Intel 64 Software Developer’s Manual documents the x86-64 instruction set. Download Volume 2 of the official manual or browse an unofficial online version.
  • Compiler Explorer is an awesome website where you can put in source code and see how a bunch of different compilers translate it to assembly.

Updates

If changes to GCC, Clang, or other tooling affect this project, I’ll cover them here. (Updates to GCC/Clang behaviors that the book discusses in passing but don’t affect the project itself are covered in the Errata section below.)

None yet!

Errata

  • Chapter 6, p. 121. In the first sentence of the Parsing Conditional Expressions subsection, the ? : operator is backwards. This sentence should read:

    The conditional ? : operator is a ternary operator…

  • Chapter 9, p. 180. On the last line of Listing 9-21, the call to typecheck_block is missing the symbols argument. This line should be:

           typecheck_block(decl.body, symbols)
    

    Thanks to Ian for catching this!

  • Chapter 13, p. 335. The assembly code given in Table 13-3 to convert from double to unsigned long includes an unnecessary instruction. As the last step in the conversion, this code copies the immediate 9223372036854775808 into a register, then adds it to the destination:

    Mov(Quadword, Imm(9223372036854775808), Reg(<R>))
    Binary(Add, Quadword, Reg(<R>), dst)
    

    This step takes two instructions because the source operand of add can’t be bigger than INT_MAX. However, it’s better to generate a single invalid add instruction at this stage and let the instruction fix-up pass rewrite it. That is, instead of those two instructions, we can just generate:

    Binary(Add, Quadword, Imm(9223372036854775808), dst)
    

    This is better because it uses one of our scratch registers (R10) instead of a non-scratch register, which will reduce register pressure once we implement register allocation in Chapter 20.

    Thanks to Yoshi Sono for catching this!

  • Chapter 14, p. 351. The “Null Pointers and Type Conversions” section on page 351 presents three examples of illegal implicit type versions and says:

    GCC warns about the implicit conversions in the previous three code snippets, but it still compiles them.

    GCC 14.1 and later treats all of these examples as errors by default. Hooray!

  • Chapter 20, p. 665-666. The coalesce function, discussed on pages 665-666, needs to handle an additional edge case if you completed Part II. You should not coalesce two pseudoregisters of different sizes (e.g. don’t coalesce a Longword into a Quadword or vice versa). If we coalesce a larger pseudoregister into a smaller one, and we end up spilling the coalesced register, we won’t allocate enough stack space for it, resulting in a potential miscompilation. Normally, the Briggs test should prevent the coalesced register from spilling, but the “ugly detail about the George test” discussed on page 661 means this isn’t guaranteed. Besides, as that discussion on page 661 points out, the Briggs and George tests are performance heuristics and we shouldn’t rely on them for correctness.

    Alternatively, you could coalesce pseudoregisters of different sizes but make sure to coalesce the smaller one into the larger one; however, that’s more fragile, so I don’t recommend it.

Tips on Extra Credit Features

Here are a few pointers about how to implement the extra credit features so you won’t have to refactor them later in the book.

  • Chapter 3: Bitwise operators. This feature requires one-byte register aliases (e.g. %cl instead of %ecx). Use the same assembly AST node to represent one-byte and four-byte register aliases; for example, you should use CX to represent both %ecx and %cl. In the code emission pass, decide which alias to emit based on context. (For now, one-byte register aliases only appear as the first operand of bitwise shift instructions.) Thanks to Tobin Yehle for suggesting this pointer.

  • Chapter 5: Compound assignment and increment/decrement operators. In compound assignment expressions like lval += rval, lval must be evaluated only once. Don’t rewrite it as lval = lval + rval; this will lead to incorrect behavior for expressions like array[f()] += x in Part II. The same goes for prefix and postfix expressions; don’t rewrite ++op as op = op + 1.

  • Chapter 11: Bitwise operations with shl and shr. The shl and shr assembly instructions perform unsigned bit-shifting operations. If you’re implementing bitwise operators for extra credit, you’ll need these instructions starting in Chapter 11, but the main text of the book introduces them in later chapters. You may want to give them different names in the assembly AST than the book does.

    Each bit shift instruction (including shl, shr, sal, and sar) has two forms. The two-operand form, which you presumably learned about in Chapter 3, takes a shift count as its source operand. The one-operand form shifts its destination operand by one bit.

    The book introduces the one-operand form of shr as Shr in Chapter 13, and the two-operand forms of both instructions as Shl and ShrTwoOp in Chapter 18. You only need the two-operand forms to implement the bitwise extra credit feature.

Setup Tips

Please email me if you run into setup problems that aren’t addressed here.

Validating Your Setup

The test script includes a --check-setup option to validate that your system meets the project requirements:

$ git clone https://github.com/nlsandler/writing-a-c-compiler-tests.git
$ cd writing-a-c-compiler-tests
$ ./test_compiler --check-setup
All system requirements met!

Installing Python

You’ll need Python 3.8 or later to run the test suite. To see which Python version (if any) is already installed, run:

$ python3 --version

If you don’t already have a recent-enough Python installed, follow the platform-specific installation instructions below.

macOS

Install the latest version of Python with Homebrew:

  1. Install Homebrew if needed:
    $ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
    
  2. Install Python:
    brew install python3
    

If you don’t want to use Homebrew, you can use the official Python installer instead:

  1. Go to https://www.python.org/downloads/
  2. Click the download button.
  3. Open the downloaded package and follow the prompts.

Linux

Use pyenv to install a recent version version of Python:

  1. Install pyenv:
    $ curl https://pyenv.run | bash
    
  2. Follow these instructions to set up your shell environment.
  3. Install the recommended Python build dependencies for your operating system.
  4. Install a recent Python version, e.g.
    $ pyenv install 3.12
    
  5. Activate this Python version whenever you’re in the writing-a-c-compiler-tests directory:
    $ cd /path/to/writing-a-c-compiler-tests/
    $ pyenv local 3.12
    

Now when you run the ./test_compiler script from inside this directory, it will be invoked with this version of Python.

hello_debugger.s (for Appendix A)

Download the example program for the debugging tutorial in Appendix A.

Acknowledgements

After I wrote the acknowledgements, I got help with the book from even more people. Maybe I can add them to the acknowledgements page in the next printing, but until then I’ll thank them here.

Thanks to everyone who tried out the test suite: Tobin Yehle, John V. Tan, Yoshi Sono, Norberto Ortigoza, Adrien Lamarque, Blaise Watson, Rory Sawyer, Moritz Neeb, Raunak Ramakrishnan, Alan Lin, Sam van Gool, Sara Venkatesh, Marcin Morawski, Ori Bernstein, Dibri Nsofor, Andrew Nichols, Vedang Manerikar, Janice Shiu, John Murphy, Anvay Grover, Geoffrey Gilmore, and Maja Kądziołka. Their feedback made the test suite better and more usable.

Thanks to Karen Sandler for helping me check that the alt text for the figures was clear and accurate.