Introduction | Projects and Files | Statements | Expressions | Symbols | Types |
Attributes | Labels | Non-Member Functions | Constructing Declarations | Example Programs | Index |
Sage++ is an attempt to provide an object oriented toolkit for building program transformation systems for Fortran 77, Fortran 90, C and C++ languages. Sage++ is intended to be used by researchers interested in building parallelizing compilers, performance analysis tools, and source code optimizers. It is designed as an open C++ class library that provides the user with a set of parsers, a structured parse tree, a symbol and type table and access to programmer annotations embedded in the source text. The heart of the system is a set of functions that allow the tool builder complete freedom in restructuring the parse tree and a mechanism (called unparsing) for generating new source code from the restructured internal form.
The library is organized as a class hierarchy that provides access to the parse tree, symbol table and type table for each file in an application project. There are five basic families of classes in the library: Project and Files, Statements, Expressions, Symbols, and Types.
Sage++ is based on an older system called Sigma (faust) which was,
in turn, based on the Blaze compiler designed by Piyush Mehrotra in
1984. In its original form, Sigma was a tool kit for program
restructuring that was accessed through the EMACS
text editor
(sigmacs
). The primary advantage of EMACS
is that it has powerful, built-in programming system, ELISP
,
that allows it to be highly customized. Tool builders were able to call Sigma
operations from ELISP
code and interactively restructure
programs.
Because many potential users were interested in building ``stand alone''
tools (not linked to EMACS
), we designed an C function
interface to the Sigma system, called Sigma II, which provided
a high level view of the ``data base'' consisting of the parse trees of
the source code and the associated data dependence information. Because the
underlying data structures generated by the parsers are very complex and
awkward to use, Sigma II provided a level of abstraction that allowed
the tool builder to work in terms of source program units like
statements and expressions rather than the ``low level'' bit fields and
linked lists of the internal structures. About the same time a group at
the University of Rennes and IRISA in France, designed a Sigma
``ToolBox'' that provided access to more powerful transformations and
users annotations in the source code. Alhough the ToolBox was more
powerful, it also required more knowledge of the underlying parser data
structures.
The design of Sage++ is based on the IRISA ToolBox, but it provides an additional level of abstraction similar to, but more flexible than the Sigma II interface. One important difference between Sage++ and Sigma is the treatment of data dependencies and control flow information.
The Sigma System, has a built-in control flow and data dependence analysis package. While this system had many advanced features, such as full symbolic analysis and rudimentary interprocedural capabilities, it was also limited in scope and hard to use. What was more important, it was embedded at the lowest level of software and written in terms of the parser data structures. Consequently it was nearly impossible to modify by users wishing to experiment with more recent advances in data dependence analysis theory.
In Sage++, it has been decided to add the control flow structures and data dependence analysis primitives on top of the user level class library. In this way, they can be easily modified or extended by the tool user. This aspect of Sage++ is not yet complete.
In this section we provide an overview of the Sage++ library. There are five basic families of classes in the library: Projects and Files which correspond to source files in a multi-source application project; Statements which correspond to the basic source statements in Fortran90, C and C++; Expressions which are contained within statements; Symbols which are the basic user defined identifiers; and Types which are associated with each identifier and expression. In addition, the SgAttribute class allows the users to add their own information to Sage++ objects. Attributes can be attached to SgStatement, SgExpression, SgSymbol, and SgType objects. To find out more about the attributes, please see section Attributes.
In Sage++, program parsing and program analysis and restructuring
are divided into two phases. Application projects in Fortran77,
Fortran90, C and C++ are first parsed, one file at a time to produce
a machine independent binary internal format called a .dep
file. For example, given a application with source files
Main.f, Subs.f, c++funs.C, cfuns.c
one invokes the Fortran parser cfp
or the C parser
pc++
to generate the corresponding .dep
files. Finally
the user builds a project file, MyProject.proj which lists each
of the .dep
files, one per line. In this example, the .proj
file is
Main.dep Subs.dep c++funs.dep cfuns.dep
The source language type is encoded within the .dep
file.
It should be noted that the .dep
file is a complete translation of
the source including comments, and the original source, up to the line
numbers of statements, can be regenerated. Note that pc++
passes the files through a standard preprocessor before actually parsing
them and the comments are discarded by the preprocessor. However,
pC++2dep
does not include the preprocessing step, and
thus comments are retained (but no preprocessing is done).
The purpose of the project file is so that it is possible to exploit interprocedural analysis.
Sage++ has proven to be a powerful tool for our compiler prototyping
experiments, but it still has a number of important limitations.
The most important of these is that it is not easy for users to
add language extensions to Fortran or C to the system. In principle
this is not difficult. To add a new statement to the language
one must extend the parser which is based on a the GNU Bison
version of YACC
. A new node type must be added to the
internal form and a corresponding subclass added to the Sage++ hierarchy.
The unparser module, which is table driven, must be extended to recognize
this new node. While we have done this several times (we have added some
of the PCF extensions to Fortran and extended C++ to define our pC++
language), it is not an easy task because it requires a complete
understanding of the internal parser structures.