C Compiler, Part 8: Loops
This is the eighth post in a series. Read part 1 here.
In this post we’re going to add loops! Now we’ll finally be able to compile FizzBuzz…except we won’t, because we can’t call printf yet. Still, it’s progress!
If you’ve been following along, note that there was a mistake in the last post. Make sure you read the “Deallocating Variables” section and update your compiler to pass the new stage 7 tests before you start on stage 8.
As usual, accompanying tests are here.
Part 8: Loops
In this post we’re implementing what the C11 standard calls iteration statements; if you want to refer to the standard itself, they’re in section 6.8.5. There are a few different iteration statements:
for
loops
First, some terminology. I’m going to call the three parts of a for
loop header the initial clause, controlling expression, and post-expression, as in:
for (int i = 0; // initial clause
i < 10; // controlling expression
i = i + 1 // post-expression
) {
// do something
}
for
loops come in two flavors: one where the initial statement is a variable declaration, and one where it’s just an expression.
Flavor #1:
for (int i = 0; i < 10; i = i + 1) {
// do something
}
Flavor #2:
int i;
for (i = 0; i < 10; i = i + 1) {
//do something
}
One interesting thing about for
loops is that any of the expressions in the loop header can be empty:
for (;;) {
//do something
}
But if the controlling expression is empty, the compiler needs to replace it with a constant nonzero expression1. So the example above is equivalent to:
for (;1;) {
//do something
}
while
and do
Loops
There’s not a whole lot to say about these.
while (i < 10) {
i = i + 1;
}
do {
i = i + 1;
} while (i < 10); // <- the semicolon is required!
break
and continue
break
and continue
aren’t loops, but they always appear inside loops, so it makes sense to add them now2. The C11 standard calls them “jump statements” and defines them in section 6.8.6.
A break
statement inside a loop causes execution to jump to the end of the loop:
while (1) {
break; // go to end of loop
}
// break statement will go here
A continue
statement causes execution to jump to the end of the loop body – immediately before the post expression in a for loop.
for (int i = 0; i < 10; i = i + 1) {
if (i % 2)
continue;
// do something
//continue statement will jump here
}
In the example above, the loop will execute ten times, but only “do something” for odd values of i.
Null statements
Sort of like you can have null expressions in a for
loop, you can also have null statements3:
int a = 0;
; // does nothing
return a;
Null statements don’t really have anything to do with loops, but they share a common feature with the expressions in a for loop: they’re both defined in terms of optional expressions in the standard. Since we need to support optional expressions in for loops, it’s pretty easy to add support for null expressions too.
As usual, we’ll update the lexing, parsing, and code generation passes, in order.
Lexing
We’re adding five (!) keywords in this post: for
, do
, while
, break
, and continue
.
Here’s all our tokens so far:
{
}
(
)
;
int
return
- Identifier
[a-zA-Z]\w*
- Integer literal
[0-9]+
-
~
!
+
*
/
&&
||
==
!=
<
<=
>
>=
=
if
else
:
?
for
while
do
break
continue
☑ Task:
You know the drill here.
Parsing
We’re adding six kinds of statements: do
loops, while
loops, the two different kinds of for
loop, break
and continue
.
We’re also changing the Exp
statement; its argument is now optional, so we can use it to represent null statements.
Now we can construct a null statement in the AST like this:
null_exp = Exp(None)
The initial expression and post-expression in a for
loop are also optional.
Here’s the updated definition of statements in the AST, with new and changed parts bolded:
statement = Return(exp) | Exp(exp option) | Conditional(exp, statement, statement option) // exp is controlling condition // first statement is 'if' block // second statement is optional 'else' block | Compound(block_item list) | For(exp option, exp, exp option, statement) // initial expression, condition, post-expression, body | ForDecl(declaration, exp, exp option, statement) // initial declaration, condition, post-expression, body | While(expression, statement) // condition, body | Do(statement, expression) // body, condition | Break | Continue
Note that our AST lets break
and continue
statements appear outside of loops, even though that’s illegal; we’ll catch that error during code generation, not parsing.
The trickiest part of the grammar here is dealing with optional expressions. I dealt with this by defining an <exp-option>
symbol:
<exp-option> ::= <exp> | ""
Once we’ve added that, updating the grammar for statements is pretty easy:
<statement> ::= "return" <exp> ";" | <exp-option> ";" | "if" "(" <exp> ")" <statement> [ "else" <statement> ] | "{" { <block-item> } "} | "for" "(" <exp-option> ";" <exp-option> ";" <exp-option> ")" <statement> | "for" "(" <declaration> <exp-option> ";" <exp-option> ")" <statement> | "while" "(" <exp> ")" <statement> | "do" <statement> "while" "(" <exp> ")" ";" | "break" ";" | "continue" ";"
If you’re wondering why there’s a semicolon after the initial <exp-option>
in the first for
rule, but not after the initial <declaration>
in the second one, it’s because the rule for <declaration>
also includes a semicolon.
Parsing <exp-option>
isn’t entirely straightforward, because the empty string is not actually a token. I dealt with this by looking ahead to see if the next token was a close paren (after a post-expression) or a semicolon (after a statement, post-expression or controlling condition). If it was, the expression was empty; if not, not. I think this approach violates some formalisms about context-free grammars and LL parsers: in order to parse an <exp-option>
symbol, you may have to look at a token that comes after that symbol.
This isn’t actually a problem, but if it bothers you, you can refactor the grammar to avoid it4:
<exp-option-semicolon> ::= <exp> ";" | ";"
<exp-option-close-paren> ::= <exp> ")" | ")"
<statement> ::= ...
| <exp-option-semicolon> // null statement
| "for" "(" <declaration> <exp-option-semicolon> <exp-option-close-paren> <statement>
...
Note that there’s a discrepancy here between the grammar and the AST definition; the grammar allows controlling expressions in for
loops to be empty, but the AST doesn’t. That’s because, as I mentioned earlier, an empty controlling expression needs to be replaced with a nonzero constant. So our approach to parsing controlling expressions in for
loops will look something like this:
match parse_optional_exp(controlling_expression) with
| Some e -> e
| None -> Const(1) // construct a constant nonzero expression
You could do this during the code generation stage instead of the parsing stage, if you wanted.
☑ Task:
Update parsing to succeed on all valid stage 1-8 examples, and fail on all invalid stage 8 examples whose names start with syntax_err
.
Code Generation
Null Statements
Don’t emit any assembly for null statements. Easy!
while
loops
Given a while
loop like this:
while (expression)
statement
we can describe its control flow like this:
- Evaluate
expression
. - If it’s false, jump to step 5.
- Execute
statement
. - Jump to step 1.
- Finish.
I won’t show you the exact assembly you need to generate here; by now you know enough to figure it out yourself.
The main thing is labeling steps 1 and 5, so when we need a jump instruction we have somewhere to jump to.
It’s worth noting that the loop body is a new scope, and you need to reset your current_scope
set accordingly.
do
Loops
These are basically the same as while
loops; just evaluate the expression after the statement.
for
loops
Given a for
loop like this:
for (init; condition; post-expression)
statement
we can break it down in the same way as while
loops above:
- Evaluate
init
. - Evaluate
condition
. - If it’s false, jump to step 7.
- Execute
statement
. - Execute
post-expression
. - Jump to step 2.
- Finish.
The init and post-expression might be empty, in which case we just don’t emit any assembly for steps 1 and 5. Note that a for
loop, including the header, is a block with its own scope, and the body of the for
loop is also a block. That means you can have code like this:
int i = 100; // scope 1
for (int i = 0; i < 10; i = i + 1) { // scope 2 - variable i shadows previous i
int i; //scope 3 - this variable i shadows BOTH previous i's
}
The main gotcha here is that you need to pop the variable declared in init
off the stack
when you exit the block, just like you needed to handle deallocating other variables in the last post.
break
and continue
We can implement each of these with a single jmp
instruction – the trick is just figuring out where to jump to. A break statement “terminates execution of the smallest enclosing switch
or iteration statement,” so we want to jump to the point right after the loop5. We already have an “end of loop” label, which we jump to when the controlling condition is false; we just need to pass that label around along with the variable map, stack index and current scope.
We also need to pass another label for continue
to refer to. continue
“causes a jump to the loop-continuation portion of the smallest
enclosing iteration statement; that is, to the end of the loop body”6 – that’s step 4 in the while
loop or step 5 in the for
loop above.
Unlike the stack index, variable map and so forth, the jump and continue labels can be null, if you’re not inside a loop. Hitting a break
or continue
statement when these labels are null should, of course, cause an error.
At this point, I was passing enough arguments around that I defined a Context
type and wrapped it all up in that. You may want to do something similar, but you don’t have to.
Up Next
In the next post we’re going to implement a pretty fundamental concept: function calls. I don’t know about you but I am VERY EXCITED for function calls. See you then!
If you have any questions, corrections, or other feedback, you can email me or open an issue.
1 See section 6.8.5.3 of the C11 standard.↩
2
break
can also appear in switch
statements, but we haven’t added those yet.↩
3 C11 standard, section 6.8.3. ↩
4 Thank you to Ian for catching an error in the refactored grammar in an earlier version of this post.↩
5 C11 standard, section 6.8.6.3.↩
6 C11 standard, section 6.8.6.2.↩