We generate code which makes use of seven dedicated registers:
A closure is heap allocated and looks much like a vector:
The length is currently not used, but if we wanted to add garbage collection we would need the length (and perhaps some other information) at the start of the closure.----------------------------------------------------------- | length | code pointer | free value0 | free value1 | ... | ----------------------------------------------------------- ^ \--- cp
A frame is stack allocated, and looks like:
The saved closure pointer points to the closure of the previous call (the closure of the frame's continuation).|-----------------------| | ... | |-----------------------| | inline argument | |-----------------------| | | | ... | |-----------------------| | bound value1 | |-----------------------| | bound value0 | |-----------------------| | return code pointer | <---- fp |-----------------------| | saved closure pointer | |-----------------------|
All of our data are represented by 32-bit words, with the lower three bits as a kind of type-tag. While this would normally only allow us eight types, we cheat a little bit: Booleans, empty-lists and characters can be represented in (much) less than 32 bits, so we steal a few of their data bits for an ``extended'' type tag.
Numbers:
--------------------------------------
| 29-bit 2's complement integer 000 |
--------------------------------------
Booleans:
------------------- -------------------
#t: | ... 1 00000 001 | #f: | ... 0 00000 001 |
------------------- -------------------
Empty lists:
-----------------
| ... 00001 001 |
-----------------
Characters:
---------------------------------------
| ... 8-bit character data 00010 001 |
---------------------------------------
Pairs, strings, symbols, vectors and closures maintain a 3-bit type
tag, but devote the rest of their 32 bits to an address into the heap
where the actual value is stored:
Pairs: --------------- ------------- | address 010 | --> | car | cdr | -----\--------- / ------------- ----------- Strings: --------------- ------------------------------------------------- | address 011 | --> | length | string data (may span many words)... | -----\--------- / ------------------------------------------------- ----------- Symbols: --------------- -------------------------- | address 100 | --> | symbol name (a string) | -----\--------- / -------------------------- ----------- Vectors: --------------- | address 101 | -----|--------- v ----------------------------------------------------------- | length | (v-ref 0) | (v-ref 1) | ... | (v-ref length-1) | ----------------------------------------------------------- Closures: --------------- | address 110 | -----|--------- v ----------------------------------------------------------------------- | length | code pointer | (free 0) | (free 1) | ... | (free length-1) | -----------------------------------------------------------------------
The code generator works somewhat like the passes we've shown so far, in that it is syntax directed: We will generate code by doing a top-down traversal of the code-generation form expression. We will need no environment to pass, but we will need some other parameters.
A bare skeleton of the code generator is shown here:
(define cg
(lambda (exp fs dd cd nextlab)
(record-case exp
[bound (n name) ... ]
[free (n name) ... ]
[quote (obj) ... ]
[begin (a b) ... ]
[if (t c a) ... ]
[build-closure (code . fvars) ... ]
[else
(if (symbol? (car exp))
...
(if (eq? cd 'return)
...
...))])))
There are five arguments to this procedure, which doesn't look like
such a good thing *smile*. The arguments are fairly easy to
understand, however:
(register-name number)
-- in which case the value of the expression should be
stored in memory number bytes offset from the
address stored in register-name, or effect -- in which case the
value of the expression may be thrown away completely:
nobody needs it. (label label) -- in
which case control should transfer to the first label if
the current expression evaluates to a true value (at
runtime), and to the second label if the current
expression evaluates to a false value, or return -- in which case we
should do a procedure-call return after the expression
executes. These two parameters are what makes our code generator
``destination-driven''. Since there are three kinds of data
destinations and three kinds of control destinations, it can be useful
to see how they interact. In the following table, OK
signifies that any context may have that data-control destination,
-- indicates that none can, BEGIN
indicates that the context only happens when evaluating the first
part of a begin expression, and IF
indicates that the context only happens when evaluating the test part
of an if expression.
cd
'return label (label label)
------------------------------
'effect | -- BEGIN IF
dd register | OK OK --
(register offset) | -- OK --
Transfer of control may involve:
return. Generating code for free and bound
expressions involves moving the data from the proper location (either
on the stack or in the current closure) to the destination given as
dd, and then transfering control to the destination
given as cd, as shown above.
Generating code for quote expressions simply involves
storing the encoding of whatever whatever quoted literal we have into
the data destination, and transfering control as above.
We could do one optimization: If we notice that our control destination is a pair of labels, we can do the test at compile-time: if the quoted object is false, emit an unconditional jump to the ``false'' label, otherwise emit an unconditional jump to the ``true'' label.
Generating code for begin expressions simply involves
generating code for the first part of the begin with a
dd of effect, and then generating code
for the second part of the begin.
Generating code for if expressions is somewhat
tricky: We want to first generate code for the test portion, giving it
a dd of effect and a cd of
two newly generated labels. We follow that by generating code for the
then portion of the if, headed by the first
of the two newly-generated labels, and for the else
portion of the if, headed by the second of the two
newly-generated labels. The transfer of control (and the testing)
necessitated by the if will be handled automagically, as
above.
Surprisingly enough, generating code for
build-closure expresssions is easy. We simply allocate
enough space on the heap to hold the new closure, store the values of
all the free variables into the closure structure, and store the code
pointer at the beginning of the new closure. Then we move the closure
to the data destination, and transfer control as necessary.
Storing the code pointer involves generating a new label and storing the address of that new label. Then, we make a promise to generate code for the body of the procedure, headed by that new label.
The first thing we need to do for any application is to
generate code to evaluate its arguments. In our code generator, this
means calling cg recursively on each argument. We call
it with dd as the current top of stack
(fp+fs), and with fs one word
greater than the current frame-size.
Once the code for all the arguments are generated, we do the proverbial ``something special'', depending on which primitive we're generating.
The tricky part of generating code for primitives is the way
primities interact when dd is effect.
For example, if code for
is generated with dd as(cons '1 '2)
effect and
cd as a single label, we need not generate any code
at all! Or, if cd is two labels, then we need only
jump to the ``true'' label, since cons only returns true
values.
Yet even with a dd of effect, we
would have to generate code for
nonsensical as it is. We don't need to generate the code which actually does the consing, but we will have to generate code for the(cons (vector-set! (free 1 x) '0 '4) '3)
vector-set!.
Like primitive applications, the first thing necessary to generate code for a non-tail application is to generate code to evaluate the arguments. This time, however, we leave two words empty at the top of the stack, just under the locations for the newly generated arguments. In these two words we store, first, the current closure, and second, the return pointer: the address of a newly generated label (which we will call return, for now).
After generating code for the arguments, we generate code for the operator, with dd as the closure pointer, cp. This is safe because we've already saved the old closure pointer. Once this code is emitted, we emit code to increment the frame pointer by the old frame-size (thus making the frame pointer point to the base of the new frame), and jump to the location pointed to by the new cp's code pointer.
Since this is a non-tail call, we will eventually have to return. So, we emit the label return, emit code to decrement the frame pointer, emit code to restore the closure pointer, and finally emit code to move the contents of the accumulator to the data destination and transfer control to the control destination.
Again, as for all applications, the first thing we need to do is to generate code to evaluate the arguments. Then, as before, we generate code to evaluate the operator with dd as the closure pointer. Now, however, we want to re-use the current frame rather than creating a new one. So we generate code to move all the arguments from their locations at the top of the frame to locations at the bottom of the frame, where the values of the bound variables reside. Once these are moved, we generate code to jump to the new cp's code pointer, and we're done: no return code is necessary.
We provide a Scheme file to make it easier for students to implement the code generator. It contains symbolic definitions of the various type tags, a procedure to encode simple data as 32-bit numbers, and a procedure used to build the instruction stream:
cg-help.ss
Here is the somewhat-completed code-generator:
cg.ss