Code Generation

[ Architecture | Generating Code | Support Code ]

Architecture

Registers

We generate code which makes use of seven dedicated registers:

Memory Layout

A closure is heap allocated and looks much like a vector:

-----------------------------------------------------------
| length | code pointer | free value0 | free value1 | ... |
-----------------------------------------------------------
   ^
   \--- cp
The length is currently not used, but if we wanted to add garbage collection we would need the length (and perhaps some other information) at the start of the closure.

A frame is stack allocated, and looks like:

|-----------------------|
| ...                   |
|-----------------------|
| inline argument       |
|-----------------------|
|                       |
| ...                   |
|-----------------------|
| bound value1          |
|-----------------------|
| bound value0          |
|-----------------------|
| return code pointer   | <---- fp
|-----------------------|
| saved closure pointer |
|-----------------------|
The saved closure pointer points to the closure of the previous call (the closure of the frame's continuation).

Data Formats

All of our data are represented by 32-bit words, with the lower three bits as a kind of type-tag. While this would normally only allow us eight types, we cheat a little bit: Booleans, empty-lists and characters can be represented in (much) less than 32 bits, so we steal a few of their data bits for an ``extended'' type tag.

Numbers:                                                                 
--------------------------------------                                   
| 29-bit 2's complement integer  000 |                                   
--------------------------------------                                   

Booleans:                                                                
      -------------------       -------------------                        
  #t: | ... 1 00000 001 |   #f: | ... 0 00000 001 |                        
      -------------------       -------------------                        

Empty lists:
-----------------                                                         
| ... 00001 001 |                                                         
-----------------                                                         

Characters:                                                              
---------------------------------------                                   
| ... 8-bit character data  00010 001 |                                   
---------------------------------------                                   
Pairs, strings, symbols, vectors and closures maintain a 3-bit type tag, but devote the rest of their 32 bits to an address into the heap where the actual value is stored:
Pairs: --------------- ------------- | address 010 | --> | car | cdr | -----\--------- / ------------- ----------- Strings: --------------- ------------------------------------------------- | address 011 | --> | length | string data (may span many words)... | -----\--------- / ------------------------------------------------- ----------- Symbols: --------------- -------------------------- | address 100 | --> | symbol name (a string) | -----\--------- / -------------------------- ----------- Vectors: --------------- | address 101 | -----|--------- v ----------------------------------------------------------- | length | (v-ref 0) | (v-ref 1) | ... | (v-ref length-1) | ----------------------------------------------------------- Closures: --------------- | address 110 | -----|--------- v ----------------------------------------------------------------------- | length | code pointer | (free 0) | (free 1) | ... | (free length-1) | -----------------------------------------------------------------------

Generating Code

The code generator works somewhat like the passes we've shown so far, in that it is syntax directed: We will generate code by doing a top-down traversal of the code-generation form expression. We will need no environment to pass, but we will need some other parameters.

A bare skeleton of the code generator is shown here:

(define cg
  (lambda (exp fs dd cd nextlab)
    (record-case exp
      [bound (n name) ... ]
      [free (n name) ... ]
      [quote (obj) ... ]
      [begin (a b) ... ]
      [if (t c a) ... ]
      [build-closure (code . fvars) ... ]
      [else
        (if (symbol? (car exp))
            ...
            (if (eq? cd 'return)
                ...
                ...))])))
There are five arguments to this procedure, which doesn't look like such a good thing *smile*. The arguments are fairly easy to understand, however:

The Control and Data Destinations

These two parameters are what makes our code generator ``destination-driven''. Since there are three kinds of data destinations and three kinds of control destinations, it can be useful to see how they interact. In the following table, OK signifies that any context may have that data-control destination, -- indicates that none can, BEGIN indicates that the context only happens when evaluating the first part of a begin expression, and IF indicates that the context only happens when evaluating the test part of an if expression.

                                     cd

                         'return  label  (label label)
                       ------------------------------
              'effect  | --       BEGIN  IF
 dd          register  | OK       OK     --
    (register offset)  | --       OK     --

Transfering control

Transfer of control may involve:

Expression types

Free and Bound expressions

Generating code for free and bound expressions involves moving the data from the proper location (either on the stack or in the current closure) to the destination given as dd, and then transfering control to the destination given as cd, as shown above.

Quote expressions

Generating code for quote expressions simply involves storing the encoding of whatever whatever quoted literal we have into the data destination, and transfering control as above.

We could do one optimization: If we notice that our control destination is a pair of labels, we can do the test at compile-time: if the quoted object is false, emit an unconditional jump to the ``false'' label, otherwise emit an unconditional jump to the ``true'' label.

Begin expressions

Generating code for begin expressions simply involves generating code for the first part of the begin with a dd of effect, and then generating code for the second part of the begin.

If expressions

Generating code for if expressions is somewhat tricky: We want to first generate code for the test portion, giving it a dd of effect and a cd of two newly generated labels. We follow that by generating code for the then portion of the if, headed by the first of the two newly-generated labels, and for the else portion of the if, headed by the second of the two newly-generated labels. The transfer of control (and the testing) necessitated by the if will be handled automagically, as above.

Build-closure expressions

Surprisingly enough, generating code for build-closure expresssions is easy. We simply allocate enough space on the heap to hold the new closure, store the values of all the free variables into the closure structure, and store the code pointer at the beginning of the new closure. Then we move the closure to the data destination, and transfer control as necessary.

Storing the code pointer involves generating a new label and storing the address of that new label. Then, we make a promise to generate code for the body of the procedure, headed by that new label.

Primitive Calls

The first thing we need to do for any application is to generate code to evaluate its arguments. In our code generator, this means calling cg recursively on each argument. We call it with dd as the current top of stack (fp+fs), and with fs one word greater than the current frame-size.

Once the code for all the arguments are generated, we do the proverbial ``something special'', depending on which primitive we're generating.

The tricky part of generating code for primitives is the way primities interact when dd is effect. For example, if code for

(cons '1 '2)
is generated with dd as effect and cd as a single label, we need not generate any code at all! Or, if cd is two labels, then we need only jump to the ``true'' label, since cons only returns true values.

Yet even with a dd of effect, we would have to generate code for

(cons (vector-set! (free 1 x) '0 '4) '3)
nonsensical as it is. We don't need to generate the code which actually does the consing, but we will have to generate code for the vector-set!.

Non-Tail Calls

Like primitive applications, the first thing necessary to generate code for a non-tail application is to generate code to evaluate the arguments. This time, however, we leave two words empty at the top of the stack, just under the locations for the newly generated arguments. In these two words we store, first, the current closure, and second, the return pointer: the address of a newly generated label (which we will call return, for now).

After generating code for the arguments, we generate code for the operator, with dd as the closure pointer, cp. This is safe because we've already saved the old closure pointer. Once this code is emitted, we emit code to increment the frame pointer by the old frame-size (thus making the frame pointer point to the base of the new frame), and jump to the location pointed to by the new cp's code pointer.

Since this is a non-tail call, we will eventually have to return. So, we emit the label return, emit code to decrement the frame pointer, emit code to restore the closure pointer, and finally emit code to move the contents of the accumulator to the data destination and transfer control to the control destination.

Tail Calls

Again, as for all applications, the first thing we need to do is to generate code to evaluate the arguments. Then, as before, we generate code to evaluate the operator with dd as the closure pointer. Now, however, we want to re-use the current frame rather than creating a new one. So we generate code to move all the arguments from their locations at the top of the frame to locations at the bottom of the frame, where the values of the bound variables reside. Once these are moved, we generate code to jump to the new cp's code pointer, and we're done: no return code is necessary.

Support Code

We provide a Scheme file to make it easier for students to implement the code generator. It contains symbolic definitions of the various type tags, a procedure to encode simple data as 32-bit numbers, and a procedure used to build the instruction stream:

cg-help.ss

Here is the somewhat-completed code-generator:

cg.ss


ehilsdal@cs.indiana.edu