[This is part 2 of a 9-part series of articles on macro expansion algorithms in Scheme. Part 3 should be posted next week.] ------------------------------------------------------------------------------ 2. Kohlbecker's ``Hygienic'' Expansion Algorithm Saves the Day We can look at our capturing problem as a problem of confusion between variables and variable names. In the first example, we have the macro definition (push a l) ==> (set! l (cons a l)) where the name `cons' is a free variable referring to the standard `cons' procedure, and the expression (let ((cons '())) (push "ghengis" cons) (push "khubla" cons) cons) where the name `cons' within the body of the `let' refers to a bound variable. These are two different variables: at the places where they are defined, they "mean" different things. The evaluator only looks at the name `cons', however, and when the naive expander "blindly" substitutes the transformed macro into the expression without preserving any context information, the evaluator can't tell the difference between the two different `cons' variables. We want our macros to be "referentially transparent": it should be possible to look at a variable reference and determine from the surrounding text what it refers to. If the macro expander inserts code, the names in that code should have the meaning that was lexically apparent at the place where the macro transformation was defined. A macro system that correctly preserves the meanings of variables is said to "maintain hygiene". Sometimes we do want to have non-hygienic macros. Occasionally it is useful to let a macro capture a particular set of identifiers on purpose. This happens relatively infrequently, however. For example, none of the "derived expression types" in section 7.3 of the Scheme report [9] need to capture any variables. It might stand to reason that macro transformers should be hygienic by default, and perhaps require a bit of extra effort to capture variables. The naive macro expander has this backwards: the default behavior easily violates hygiene, and requires a lot of extra effort to maintain it. 2.1. The Kohlbecker Algorithm In [3], and in his Ph.D. thesis [4], Eugene Kohlbecker introduced the first hygienic macro expansion algorithm. His algorithm solves the variable capturing problem by doing an alpha-conversion: systematically renaming all bound variables in a way that respects variables, instead of just variable names. This is easier said than done. The Kohlbecker algorithm does it with a four-phase process: (define kohlbecker-expand (lambda (form) (unmark (alpha-convert (expand-marked (mark form (make-new-mark))))))) 1. The first pass replaces all variable names in the source expression with "marked identifiers". These are objects, distinguishable from symbols, which are composed of the original name of the variable and a "mark". (Marks are referred to as "time stamps" in [3], and as "colors" in [7].) Every variable receives the same mark in this pass. (define mark (lambda (s stamp-value) (cond ((or (identifier? s) (macro-tag? s) (reserved-word? s)) s) ((symbol? s) (make-identifier s stamp-value)) ((pair? s) (map (lambda (z) (mark z stamp-value)) s)) (else s)))) (define reserved-word? (lambda (x) (memq x '(quote lambda set! if)))) 2. The second pass does a macro-expansion using basically the same algorithm as the naive expander, with one difference: after a macro transformer procedure is called, the new variables it introduced are replaced with marked identifiers. The variables introduced by different macro calls are given different marks. (define expand-marked (lambda (mform) (if (not (pair? mform)) mform (case (car mform) ((quote) mform) ((lambda) `(lambda ,(cadr mform) ,@(map expand-marked (cddr mform)))) ((set!) `(set! ,(cadr mform) ,(expand-marked (caddr mform)))) ((if) `(if ,@(map expand-marked (cdr mform)))) (else (if (macro-tag? (car mform)) (expand-marked (mark ((macro-transformer (car mform)) mform) (make-new-mark))) (map expand-marked mform))))))) 3. The third pass assigns unique names to all bound variables. It does this by walking through the expression, assigning names to all variables bound by lambda-expressions, and then substituting the new names for the corresponding identifier objects within the body. We unmark `quote' forms to prevent interference with the substitution machinery. (define alpha-convert (lambda (mform) (if (not (pair? mform)) mform (case (car mform) ((lambda) (alpha-convert-lambda (cadr mform) (map alpha-convert (cddr mform)))) ((quote) (unmark mform)) ((set!) `(set! ,(cadr mform) ,(alpha-convert (caddr mform)))) ((if) `(if ,@(map alpha-convert (cdr mform)))) (else (map alpha-convert mform)))))) (define alpha-convert-lambda (lambda (varlist body) (let ((substitutions (make-substitution-info varlist))) `(lambda ,(perform-substitutions substitutions varlist) ,@(perform-substitutions substitutions body))))) (define make-substitution-info (lambda (varlist) (map (lambda (id) (cons id (identifier->unique-symbol id))) varlist))) (define perform-substitutions (lambda (substitutions s) (cond ((identifier? s) (let ((entry (assoc-id s substitutions))) (if entry (cdr entry) s))) ((not (pair? s)) s) (else (cons (perform-substitutions substitutions (car s)) (perform-substitutions substitutions (cdr s))))))) 4. The fourth pass changes all remaining identifier objects back into symbols. Since we took care of bound variables and quoted symbols in the third pass, this leaves only the free variables for the fourth and final pass. (define unmark (lambda (s) (cond ((identifier? s) (identifier->symbol s)) ((pair? s) (map unmark s)) (else s)))) It doesn't matter much how we represent the marks that we apply to variables, as long as it is easy to generate unique marks and to compare them efficiently for equality. We use integers here. (define make-new-mark (let ((v 0)) (lambda () (begin (set! v (+ v 1)) v)))) (define mark=? =) Here we define a new data structure to represent identifiers. (define make-identifier (lambda (name mark) (vector 'IDENTIFIER name mark))) (define identifier->symbol (lambda (x) (vector-ref x 1))) (define identifier-mark (lambda (x) (vector-ref x 2))) (define identifier? (lambda (x) (and (vector? x) (= (vector-length x) 3) (eq? (vector-ref x 0) 'IDENTIFIER)))) (define identifier=? (lambda (x y) (and (identifier? x) (identifier? y) (eq? (identifier->symbol x) (identifier->symbol y)) (mark=? (identifier-mark x) (identifier-mark y))))) (define assoc-id (lambda (id s) (cond ((null? s) #f) ((identifier=? id (caar s)) (car s)) (else (assoc-id id (cdr s)))))) (define identifier->unique-symbol (lambda (x) (gensym))) Obviously, giving every bound variable a new, unique name can prevent capturing. In order to determine which references to variables get which new names, it is necessary to keep track of the original contexts where the variables were referenced. Kohlbecker's algorithm does this implicitly, assigning different marks to variables in different contexts. 2.2. An Example Let's look at how the Kohlbecker expander handles one of our examples from section 1.3. To make the output easier to understand, let's redefine the `identifier->unique-symbol' procedure: (define identifier->unique-symbol (lambda (x) (string->symbol (string-append (symbol->string (identifier->symbol x)) ":" (number->string (identifier-mark x)))))) (Of course, the symbols it generates aren't guaranteed to be unique.) We'll look at the following expression step-by-step: (let ((temp 37.0)) (or (foo temp) temp)) The first pass takes this expression and changes all symbols to identifier objects, giving them the mark `1': (let ((#(identifier temp 1) 37.0)) (or (#(identifier foo 1) #(identifier temp 1)) #(identifier temp 1))) Then we begin the second pass. When the expander encounters the `let' macro call, it calls the corresponding macro transformer, yielding: ((lambda (#(identifier temp 1)) (or (#(identifier foo 1) #(identifier temp 1)) #(identifier temp 1))) 37.0) It then tries to mark all newly introduced variables with the mark `2'. The result is the same, since there are no new variables here yet. Continuing to expand recursively, the expander encounters the `or' macro call and invokes the macro transformer procedure on (or (#(identifier foo 1) #(identifier temp 1)) #(identifier temp 1))) to give: ((lambda (temp) (if temp temp (or #(identifier temp 1)))) (#(identifier foo 1) #(identifier temp 1))) Marking this result with `3' gives: ((lambda (#(identifier temp 3)) (if #(identifier temp 3) #(identifier temp 3) (or #(identifier temp 1)))) (#(identifier foo 1) #(identifier temp 1))) The expander handles the inner `or' in a similar way. Eventually, the second pass completes with the result: ((lambda (#(identifier temp 1)) ((lambda (#(identifier temp 3)) (if #(identifier temp 3) #(identifier temp 3) #(identifier temp 1))) (#(identifier foo 1) #(identifier temp 1)))) 37.0) The third pass assigns names to the bound variables and substitutes them into the expression, giving: ((lambda (temp:1) ((lambda (temp:3) (if temp:3 temp:3 temp:1)) (#(identifier foo 1) temp:1))) 37.0) The fourth pass unmarks the free variables, in this case just `foo': ((lambda (temp:1) ((lambda (temp:3) (if temp:3 temp:3 temp:1)) (foo temp:1))) 37.0) This is our final answer. Our two `temp' variables from different contexts have been given different names. We seem to have fixed the problem, right? Well, yes and no, as we'll see later in section 4. 2.3. Capturing Variables Before moving on, let's take a look at how we can go about capturing variables on purpose with this algorithm. As an example, we'll define a `catch' macro that captures the name `throw' within it's body. The definition of `catch' for the naive expander would be: (macro-transformer-set! 'catch (lambda (form) `(call-with-current-continuation (lambda (throw) ,(cadr form))))) Our Kohlbecker expander will rename the `throw' binding, preventing the capture of the `throw' name and preserving hygiene. How do we fool the expander into letting us capture variables? The key is in the first pass, which marks all identifiers in the original expression with the same mark. If a macro is to capture an identifier, it has to insert an identifier with the same context information as the identifiers in the original expression. The following changes to the expander will allow this: (define *top-mark* #f) (define kohlbecker-expand (lambda (form) (set! *top-mark* (make-new-mark)) (unmark (alpha-convert (expand-marked (mark form *top-mark*)))))) (define capture (lambda (var) (make-identifier var *top-mark*))) The `capture' procedure returns an identifier object having the same top-level mark that the identifiers in the original expression had. Our revised `catch' macro definition uses it to insert the name `throw' in the lambda variables list: (macro-transformer-set! 'catch (lambda (form) `(call-with-current-continuation (lambda (,(capture 'throw)) ,(cadr form))))) Using this method, macro writers can break the hygiene rule for special-purpose macros, but still benefit from safe macros being the default.