Introduction to Scheme++ for C311

Scheme++ is a mostly-static object-oriented system designed as an extension of the Scheme programming language and patterned loosely after C++. The features of Scheme++ used in the C311 course are described here along with some examples and an introduction to some of the basic terminology of object-oriented programming. This document does not attempt to provide adequate motivation of object-oriented programming concepts and many features of Scheme++ are not described here. A preliminary complete description of Scheme++ is available, but is not necessary for the purposes of C311. Some of the statements made here are not true of Scheme++ in general, but only of the subset used in this course.

On copper Scheme++ is provided automatically by running ~c311/bin/scheme. The Scheme++ implementation source code for Chez Scheme uses define-syntax, but this need not concern the user.

Terminology

We require some common object-oriented programming terminology, plus a few terms and remarks specific to Scheme++.

An instance is an object that contains a set of instance variables that record its state and a set of methods that alone can access the instance variables. An instance is created using a class, known as the class of the instance. Classes are also used to derive (help create) other classes via inheritance. Those classes used to derive a class are known as its base classes.

If C is the class of an instance, then the instance is an instance of C. An instance of a class is also an instance of the base classes of the class.

An abstract base class is used only to derive other classes. Other classes are used only to create instances, while some classes are used for both purposes.

A class is said to be defined via multiple inheritance if it has more than one base class, and via single inheritance otherwise. Every class inherits from at least one class, except for the top class, which has no instance variables.

The inheritance relation must define a directed acyclic graph, called the class hierarchy, rooted by the top class.

Each instance has its own instance variable values, but all instances of a given class contain the same set of methods and instance variable names. Thus methods and instance variable names are thought of as belonging to the class of an instance rather than the instance itself.

A method introduced in a class belongs to the class. A method belonging to a base class of a class, C, also belongs to C unless C introduces a new method of the same name. If more than one base class defines a method of the same name, references to this name must be qualified to remove ambiguity. The techniques for doing this are not covered here, as they are not needed in this course.

Protection

Scheme++ has a rather complex set of protection mechanisms, similar to those in C++, but only two protection declarations are necessary for our purposes.

Environment example

To introduce the primary features of Scheme++, we dissect the definition of an object-oriented implementation of an environment abstract data type.
(define <environment>
  (class ()
    (public apply extend)
    (methods
      (define extend 
        (method (vars vals) 
          (<extended-environment> this vars vals)))
      (apply))))
The class form returns a new class. By convention, we use matching angle brackets to begin and end a class name.

The keyword class is immediately followed by a formal parameter specification. The formal parameters are associated with arguments when the class is initialized, as in a procedure application. Since the environment class is always initialized with no arguments, its formals specification is the empty list.

The formals specification of a class form is followed by various parenthesized clauses, each introduced by a keyword. Though there is a bit more flexibility in the order of clauses, it is suggested that they always appear in the order indicated in our examples.

The environment class contains a public clause and a methods clause. The apply and extend methods are declared to be public, for we want to be able to apply or extend environments in any context.

The method form's syntax is the same as a lambda expression except for the keyword. It returns a new method when evaluated. The first argument in every method invocation is always an instance that is used to select the method. This instance is always bound in the method's body to the variable this. In the environment class a method named extend is defined that when invoked takes two additional arguments named vars and vals. We shall see how this method is used presently.

The environment class is an abstract base class. It is intended it be inherited only by classes that always define a method named apply. It should be possible to extend or apply any environment. To make access to this method more efficient and easy, the environment class declares an apply method binding without a corresponding method-valued expression.

An empty-environment class is defined next, along with an empty-environment instance.

(define <empty-environment>
  (class ()
    (base <environment>)
    (methods
      (define apply 
        (method (v) 
          (error '<empty-environment> "Variable ~s is not bound" v))))))

(define empty-environment (<empty-environment>))
The base clause declares that this class inherits from the environment class. Since this class does not redefine the extend method, it is inherited from the environment class. An apply method is defined that takes a value and generates an error message.

Classes are represented as procedures that return instances of themselves when invoked. Any arguments passed to the procedure are used to initialize the newly created instance. The empty environment class does not require any initialization arguments, so its formals specification is also the empty list.

Non-empty environments are represented as instances of the extended-environment class, which also inherits from the environment class.

(define <extended-environment>
  (class (env vars vals)
    (base <environment>)
    (inst-vars env vars vals)
    (methods
      (define apply
	(method (sym)
	  (let ((x (memq sym vars)))
	    (if x
	      (list-ref vals (- (length vars) (length x)))
	      (apply env sym))))))))
Instances of this class are created by the call in the body of the extend method of the environment class. The formal parameter specification of the extended environment class declares that the initialization parameters passed in the instance creation call are named env, vars, and vals. Since these names are also listed in the inst-vars clause, the values of these initialization arguments are stored in the newly created instance as instance variable values.

Methods are invoked via method calls. These are calls in which the procedure is a generic and the first argument is an instance. The generic selects the method to be called from the instance and invokes the method with the instance and any other arguments of the method call.

Generics may be either local or universal. A universal generic may be invoked with any instance, but may only be used to invoke public methods. A local generic for a method named M created in class C may only be invoked with an instance of a class that introduces a method binding for M and is on a path from C to the top class. Local generics are more efficient than universal generics.

Universal generics must be created using the generic form. For example, we next define a universal generic for the extend method name and use it to invoke the extend method of our empty environment instance.

(define extend (generic extend))

(define abc (extend empty-environment '(a b c) '(1 2 3)))

Next we define an apply procedure that acts like a universal generic on instances and otherwise acts like the standard apply procedure (available in Chez Scheme as #%apply).

(define apply
  (lambda args
    (if (procedure? (car args)) 
      (#%apply #%apply args)
      (#%apply (generic apply) args))))

(apply abc 'b) ==> 2
The call of apply above yields 2 by invoking the apply method of the extended environment class. This invocation binds the symbol b to the method variable sym. The memq call then determines if this symbol is on the list stored in the instance variable vars. It is, so the value 2 in the corresponding position of the vals list is returned. Notice that the instance variables are referenced just like ordinary lexical variables.

If the symbol is not on the list of variables, this apply method executes the method call in the last line of the extended environment class definition. The procedure of this call is a local generic for the apply method that is associated with the extended environment class. Such local generics are created for each method name bound in the methods clause of a class.

We conclude this example with a new version of the extended environment class that incorporates two improvements. First, the length of the vars list is computed at instance creation time and stored in an additional instance variable called length-vars. Second, an after-inst-var-init clause is used to check after the instance variables are initialized to be sure the variable and value lists are of the same length and issue an error message otherwise.

(define <extended-environment>
  (class (vars vals env)
    (base <environment>)
    (inst-vars vars vals env 
      (define length-vars (length vars)))
      (if (not (eq? length-vars (length vals)))
	(error '<extended-environment>
	  "vars ~s and vals ~s lists not the same length"
	  vars
	  vals)))	    
    (methods
      (define apply
	(method (v)
	  (let ((x (memq v vars)))
	    (if x
	      (list-ref vals (- length-vars (length x)))
	      (apply env v))))))))

Lambda calculus example

Instances may be used to represent abstract syntax. We illustrate this by first defining abstract syntax for the lambda calculus, along with a parser. Initially, the only operation supported by the abstract syntax is unparsing.

Abstract syntax requires a distinct type of node for each non-terminal production. For each type of node we define a class with an instance variable for each non-terminal in the right-hand side of its production.

Each of these classes has its own unparse method, so we define an abstract base class from which they all inherit. In anticipation of another set of abstract syntax classes, the base class introduces the public method rename, as well as unparse.

(define <syntax>
  (class ()
    (public unparse rename)
    (methods
      (define unparse)
      (define rename))))

(define <variable>
  (class (var)
    (base <syntax>)
    (inst-vars var)
    (methods
      (define unparse 
        (method () var)))))

(define <lambda>
  (class (formal body)
    (base <syntax>)
    (inst-vars formal body)
    (methods
      (define unparse 
        (method () 
          (list 'lambda (list formal) (unparse body)))))))

(define <call>
  (class (rator rand)
    (base <syntax>)
    (inst-vars rator rand)
    (methods
      (define unparse 
        (method () 
          (list (unparse rator) (unparse rand)))))))
The instance variables are declared to be inheritable so that the language can be extended by classes that operate on the same instance variables visible to the unparse methods.

The parse procedure takes a Scheme datum representing a lambda calculus expression (as a subset of Scheme syntax) and returns the representation of the expression in abstract syntax. Viewing each abstract syntax class as a node type the abstract syntax may be viewed as a tree in which the instance variables of each instance contain the branches to subtrees.

(define parse
  (lambda (exp-datum)
    (form-case exp-datum
      (variable var
	(<variable> var))
      (lambda (formals body)
	(<lambda> (car formals) (parse body)))
      (call (rator rand)
	(<call> (parse rator) (parse rand))))))
Next we extend our abstract syntax by redefining the abstract syntax classes with three new classes that inherit from their respective old classes and add a rename method.

(define <variable>
  (class (var)
    (base <variable>)
    (base-inst-vars var)
    (base-init var)
    (methods
      (define rename
	(method (old new)
	  (if (eq? old var)
	    (<variable> new)
	    this))))))

(define <lambda>
  (class (formal body)
    (base <lambda>)
    (base-inst-vars formal body)
    (base-init formal body)
    (methods
      (define rename
	(method (old new)
	  (if (eq? old formal)
	    this
	    (<lambda> formal (rename body old new))))))))

(define <call>
  (class (rator rand)
    (base <call>)
    (base-inst-vars rator rand)
    (base-init rator rand)
    (methods	    
      (define rename
	(method (old new)
	  (<call>
	    (rename rator old new)
	    (rename rand old new)))))))
The base-inst-vars clauses allow the indicated instance variables of the base classes to be referenced directly in the new classes. The base-init clauses are necessary to pass the arguments on to the initialize method of the base classes so that the base class instance variables are initialized.

chaynes@indiana.edu