CIS 461/561 (Introduction to Compilers)

Instructor: Amr Sabry

General Information

Announcements
Tiger

Homeworks and Grades

Homework 1
Homework 2
Homework 3
Homework 4
Final Project


Getting Help

Ask for help when you need it!! Send email, at any time. I'll also hold the following office hours (in DES 313):

Course Organization

The course is organized around writing a compiler for a simple Algol-like language. The compiler project will be divided in 6-8 homeworks. I encourage you to work on the project in teams of two people, especially after the first two or three assignments. You are also encouraged to discuss the programming assignments with your colleagues. However, do not, under any circumstances, copy somebody else's code.

I will provide working versions of early homeworks in case you need them to test the later ones. It is important that you keep going even if you miss an early homework. The due dates for the assignments are firm: there will be no extensions. Your grade will be calculated as follows:

  1. in-class midterm exam: 30%
  2. quizzes: 20%
  3. programming assignments and final project: 50%
These weights are subject to minor variations.

References

A. Appel, Modern Compiler Implementation in Java. (Errata.)

Some book about Java. Pick your favorite.

The Java Core API

The JLex documentation

The CUP documentation

Notes on the SPARC assembly language

Sample Tiger Programs

The "dragon book" is the classical reference about lexical analysis and parsing: Compilers---Principles, Techniques, and Tools, A. Aho and R. Sethi and J. Ullman, Addison-Wesley, Reading, Mass., 1985.


What is this Course About?

A industrial strength compiler is a major effort (in the order of 100K lines of code and 5 man/years). Few of you will write such compilers. Most of you, however, will write compilers that process "little languages." For example, you may be writing scripts that read an input from an HTML form, parse it, check it for consistency, and generate HTML code as output.

This course will present the general principles, algorithms, and techniques that are helpful for (big and small) compilers. In addition, the course should provide you with:

  1. a concrete understanding of programming language concepts that complements the more mathematical understanding you get from a course in semantics; this should ultimately make you a better programmer,
  2. understanding the cost of programming language constructs; understanding what can be implemented reasonably and what can be expected to be optimized by a typical compiler, etc,
  3. writing a large program using an advanced programming language; managing the complexity of having multiple programmers; use modules and interfaces effectively,
  4. a great case study for applying the software engineering concepts you were introduced to in other classes. This may be as close as you will get to "real" programming in a course.
To get a concrete idea of what's involved, I'll trace a simple source program through the various stages of the compiler. Each stage roughly corresponds to one homework. Here is a simple program in the language we are compiling:
let var x : int := 2     /* silly comment */
    var y : int := 7
in x+y  
end
The first stage of compilation, lexical analysis, is to group the characters in significant units (called tokens). In that stage, we also eliminate white space and comments. The output of that stage is a list of tokens:
[ Let,
  Var,
  Id "x", 
  Colon,
  Id "int",
  Assign,
  Intliteral 2,
  Var,
  Id "y",
  Colon,
  Id "int",
  Assign,
  Intliteral 7,
  In,
  Id "x",
  Plus,
  Id "y",
  End
] 
The next phase of compilation, parsing, is to group the tokens in meaningful "sentences." The output of this phase is an abstract syntax tree that prints as follows:
LetExp { decs = [ VarDec { init = IntExp 2,
                           typ = SOME (-,15),
                           var = { escape = ref true,
                                   name = -
                                 }
                           pos = 7,
                         },
                  VarDec { init = IntExp 7,
                           pos = 28,
                           typ = SOME (-,36),
                           var = { escape = ref true,
                                   name = -
                                 }
                         }
                ],
         body = SeqExp [ (OpExp { left = VarExp (SimpleVar (-,48)),
                                  oper = PlusOp,
                                  pos = 48,
                                  right = VarExp (SimpleVar (-,50))
                                 },
                          48)
                       ],

         pos=3
       }
The heart of the compiler is this next phase, which performs checks on the syntax tree, and then generates an intermediate (machine independent) representation. The most important aspect of this phase is that it eliminates variable names and replaces them by either register names, or offsets in the stack frame.
MOVE(TEMP T101,
     ESEQ(SEQ(MOVE(MEM[4](BINOP(PLUS,TEMP T100,CONST -4)),
                   CONST 2),
              MOVE(MEM[4]( BINOP(PLUS,TEMP T100,CONST -8)),
                   CONST 7)),
              BINOP(PLUS,
                    MEM[4](BINOP(PLUS,TEMP T100,CONST -4)),
                    MEM[4](BINOP(PLUS,TEMP T100,CONST -8)))))
The next phase is rather simple. It flattens the intermediate representation and does some straightforward optimizations.
LABEL L19
MOVE(MEM[4](BINOP(PLUS,TEMP T100,CONST -4)),
     CONST 2)
MOVE(MEM[4](BINOP(PLUS,TEMP T100,CONST -8)),
     CONST 7)
MOVE(TEMP T101, 
     BINOP(PLUS,
           MEM[4](BINOP(PLUS,TEMP T100,CONST -4)),
	   MEM[4](BINOP(PLUS,TEMP T100,CONST -8))))
JUMP(NAME L18)
LABEL L18
The final stage of our compiler is to convert the above intermediate representation to a machine-readable format. I've chosen the SPARC assembly language but we may consider other choices.
.global toymain
.section ".text"
toymain:
save %sp, -136,%sp
L19:
mov 2 , %g2
st %g2, [%fp-16]
ld [%fp-16] , %g3
st %g3, [%fp+-4]
mov 7 , %g2
st %g2, [%fp-24]
ld [%fp-24] , %g3
st %g3, [%fp+-8]
ld [%fp+-4], %g2
st %g2, [%fp-32]
ld [%fp+-8], %g2
st %g2, [%fp-36]
ld [%fp-32], %g2
ld [%fp-36], %g3
add %g2, %g3, %g2
st %g2, [%fp-28]
ld [%fp-28], %i0
ba L18
nop
L18:
jmp %i7+8
restore
That's it!


Lecture Topics

Date Topic Exam/Quiz/Assignment Due
9/28 Introduction
9/30 Introduction
10/2 Regular Expressions (Ch. 2)
10/5 JLex (Ch. 2; JLex manual)
10/7 Finite Automata (Ch. 2)
10/9 Finite Automata (Ch. 2)
10/12 Context-Free Grammars (Ch. 3) Lexer
10/14 CUP (Ch. 3; CUP manual)
10/16 Top-Down Parsers (Ch. 3)
10/19 Bottom-Up Parsers (Ch. 3)
10/21 Bottom-Up Parsers (Ch. 3)
10/23 Abstract Syntax (Ch. 4) Parser
10/26 OO Design Patterns
10/28 Review
10/30 Midterm Abstract Syntax / Midterm
11/2 Symbol Tables (Ch. 5.1-5.2)
11/4 Typechecking (Ch. 5.3-5.6)
11/6 Typechecking (Ch. 5.3-5.6)
11/9 Typechecking (Ch. 5.3-5.6)
11/11 Stack frames (Ch. 6) Quiz 1
11/13 Static Links (Ch. 6) Typechecking
11/16 Intermediate Trees (Ch 7.1)
11/18 Expressions to Trees (Ch. 7.2)
11/20 Declarations to Trees (Ch. 7.3)
11/23 Canonical Trees; Traces (Ch. 8) Quiz 2
11/25 SPARC Assembly
11/27 Thanksgiving
11/30 Instruction Selection (Ch. 9)
12/2 Register Allocation
12/4 Conclusion
12/7-12/9 Final Projects Due

Page visited times since September 14, 1998.

sabry@cs.uoregon.edu