PROTRAN
An Intelligent, Rule Based Program Transformation System

Abdulrahman A. Mirza

Genesis International, Inc.
2401 W. Hassell Road, Suite 1560
Hoffman Estates, IL 60195
AbdulMirza@aol.com


Research in this paper was conducted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science while at the Illinois Institute of Technology.

Abstract

This paper introduces an intelligent, rule-based tool set for the re-engineering of programs, called the Program transformation and Normalization (PROTRAN) system. One of the main objectives of the development of the PROTRAN system has been to facilitate the process of program recognition, a necessary step in automated fault localization. The idiosyncratic code written by many programmers makes program recognition a very difficult process. The PROTRAN system, built using an object oriented paradigm, eliminates many of these programming idiosyncrasies, by the recognition and transformation of certain programming fragments into an internal normalized code format. These programming fragments are defined as data objects and are normalized based on an augmented set of programming style and discourse rules developed especially for the PROTRAN system. A working prototype exists that shows the correctness and feasibility of this system.

1.0 Introduction

Program recognition is a relatively new area of research that makes use of artificial intelligence and expert system technology. It is a form of program analysis that identifies the reason behind given programs rather than their form or behavior. Typical uses of program recognition include automatic documentation support, where a program would produce its own documentation, and, intelligent query support, where facts concerning program design are provided to the user. In either case, the issue is understanding: once a program recognition system understands a program, it can enter facts concerning that program into a knowledge base to support many other activities [OURSTON89].

Automated fault localization is another new area of research that benefits from program recognition. Knowledge about a given program can be collected, analyzed, and stored in knowledge bases which can then be used to identify that program. Once a program has been recognized, different analysis techniques may be used to localize faults in that program.

Program recognition, while simplifying the task of automated fault localization, is in itself a difficult process. What makes this process a difficult one is that many programmers write code in a non-structured, idiosyncratic manner. Rich and Waters talk about the variations in programming styles that lead to idiosyncratic code. These include: syntactic variation, non-contiguousness, implementation variation, overlapping implementations, and unplan-like/unrecognizable code [RICH&WATERS90].

In this paper, I introduce a Program Transformation and Normalization (PROTRAN) system that is used to facilitate the process of program recognition [MIRZA95]. PROTRAN is one of the important knowledge sources of BUG-DOCTOR, an automated assistant for software fault localization [BURNSTEIN, et al. 95]. BUG-DOCTOR is an intelligent, programming language independent system. It is based on a comprehensive conceptual model for software fault localization, developed with roots in cognitive science and software psychology [BURNSTEIN&MANNINA92].

BUG-DOCTOR's system architecture is based on a black-board framework [BURNSTEIN et al. 91]. The system consists of domain knowledge sources, a blackboard data structure, and a blackboard control unit. The domain knowledge sources are split into builders and analyzers. An initial set of builder knowledge sources normalizes the faulty code. This normalized code is then partitioned into candidate chunks based on data and control flow characteristics, and then matched against a correct code representation [BURNSTEIN et al. 93]. The correct code representation is stored in a programming concepts library as a collection of knowledge objects [TUBAISHAT94]. These represent stereotypical code patterns from several problem domains. BUG-DOCTOR's analyzer knowledge sources carry-out the knowledge based matching processes that enable the system to recognize the code and to localize faults in that code.

2.0 PROTRAN

The Program Transformation and Normalization system represents one of the builder knowledge sources of BUG-DOCTOR. One of the main goals of the PROTRAN system is to facilitate the process of program recognition. In order to accomplish this task, PROTRAN performs a number of transformations producing a normalized internal code format that is free of some of the programming idiosyncrasies mentioned above. The internal code format is represented as a number of knowledge objects, each representing a certain type of simple data structure.

The processes used to produce the normalized internal code representation must take into account two main factors. First, the normalized code representation must maintain the same semantics of the original code, changing the code semantics may result in new errors being created or possibly fixing a bug. Second, links between the original and transformed code must be maintained in order to communicate with the system user in terms of the original code.

The PROTRAN system transformation and normalization processes are based on an augmented set of programming style and discourse rules developed especially for the system. Some of these rules were borrowed from [SOLOWAY&EHRLICH84]:

  1. Do not place more than one control statement per line.
  2. Limit conditional structures to IF statements.
  3. Limit iterational structures to WHILE loops.
  4. Sequential assignment statements that are not interchangeable should be annotated.
  5. Sub-expressions within an assignment statement that are not interchangeable should be annotated.
  6. Temporary variables should be marked.
  7. Eliminate redundant sub-expressions.
  8. Eliminate loop invariants.
  9. All variables should be defined before use.
  10. Do not use one single variable to perform multiple tasks.
  11. Do not use "goto" statements unless absolutely necessary.

The current PROTRAN system implements rules one through six. The implementation of these rules eliminates many programming idiosyncrasies and results in a normalized code format. Rules seven and eight are implemented by optimizing compilers [AHO, et al. 86]. These and the remainder of the rules will eventually be incorporated into PROTRAN.

2.1 System Architecture

The Program Transformation and Normalization system architecture (Figure 1) consists of several tools. Some are knowledge sources that perform certain processes on the code, while others are knowledge bases that store generated data, as well as the PROTRAN set of programming style and discourse rules. In this section, I briefly discuss the role that each tools unit plays in the overall architecture of the PROTRAN system:

The Analysis Tools Unit (ATU)

This tools unit performs a set of analyses on the buggy code. It collects specific code characteristics, and creates data objects, where appropriate, representing different code fragments.

The Program Database Unit (PDU)

This knowledge base stores important code characteristics. This includes information generated by lexical, data flow, and control flow analysis tools external to the PROTRAN system, as well as data objects generated by the Analysis Tools Unit.

The Style Analysis Unit (SAU)

This knowledge source retrieves results stored in the PDU and determines which code components saved as data objects require normalization. This determination is based on the programming style and discourse rules of the system.

The Rule Knowledge Base Unit (RKBU)

This knowledge base contains the PROTRAN system's programming style and discourse rules.

The Normalization Tools Unit (NTU)

This tools unit handles the transformation of the buggy code into a normalized code representation, based on the rules of programming style and discourse.

The Normalized Program Database Unit (NPDU)

This knowledge base stores the normalized data objects having been stripped off as many code idiosyncrasies as possible.

The Link Database Unit (LDU)

This knowledge base maintains links mapping the transformed buggy code data objects into the original program code.
PROTRAN System Architecture Diagram
Figure 1: PROTRAN System Architecture

2.2 System Design

The PROTRAN system uses an object oriented design [BOOCH91] to perform its transformation and normalization processes. The main classes in the class hierarchy of the system are the following:

Each occurrence of an assignment statement, or a programming structure represented by the above classes, results in an object of that type being instantiated. Based on the rules of programming style described earlier in this paper, each object instantiated will be treated differently:

2.3 Transformation Example

In this section, a simple program code is presented which is then transformed based on the rules of programming style of the PROTRAN system.

for (k = 0; k <= x; k=k+1)
  {
      a = k + 1;
      b = k * 2;
      c = b / 2 + y * 4;
  }

For this example, four objects are instantiated. One is a for object, the other three are assignment objects. Typically, the inner most objects are normalized before outer objects. This avoids performing normalization on an already normalized code. For this example, assignment statement objects will be normalized first, followed by the for object normalization.

The logical representation of the normalized code for this example is shown below. (t) indicates a temporary variable, (o) indicates a non-interchangeable assignment statement, while (s) indicates a non-interchangeable sub-expression:


Initial Normalization:

a = k + 1; b = k * 2; (o), (s) = k * 2 c = b / 2 + y * 4; (o), (t) = b, (s) = b/2, y*4

for Object Normalization:

k = 0; while ( k <= x) { a = k + 1; b = k * 2; (o), (s) = k * 2 c = b / 2 + y * 4; (o), (t) = b, (s) = b/2, y*4 k = k + 1; }

3.0 Related Work

Few researchers have tackled the problem of program recognition. A program analysis-based debugging system called Laura [ADAM&LAURENT80] takes as input a correct implementation of the code and compares it against the buggy code. It translates both programs into an internal canonical representation as a flow graph, where nodes represent program operations, and arcs represent the control flow. The next step taken is to perform a set of simple standardizing transformations on both graphs to create more normalized representations. A matching process is then initiated to locate corresponding code fragments. If certain elements did not match, an error detection process is started, which accepts minor differences as possible errors in the buggy code. This approach, however, does not take into consideration the possibility that the correct code representation and the buggy code might have been coded using different algorithms. Other related research can be found in [HARTMAN91], [HARANDI&NING88], and [LUKEY80].

4.0 Conclusion

PROTRAN is a knowledge based system that performs program transforming normalizations that facilitate the processes of program recognition and automated software fault localization. It is a rule-based, object-oriented system that eliminates many of the code idiosyncrasies associated with most programs. This improves the chance of finding the appropriate correct code implementation in the programming concepts library of BUG-DOCTOR against which the program can be matched in order to localize faults. PROTRAN's object-oriented design makes it an easily maintainable system, where new rules of programming style, as well as new classes representing more complex data structures, may be added without altering much of the original system design nor implementation.

5.0 References

Adam, A., and Laurent, J., 1980. Laura, a System to Debug Student Programs. Artificial Intelligence, Nov. 1980, Vol. 15, No. (1,2), 75-122.

Aho, A., Sethi, R., and Ullman, J., 1986. Compilers: Principles, Techniques, and Tools. Addison-Wesley Publishing Co., Reading, MA.

Booch, G., 1991. Object Oriented Design With Applications. The Benjamin/Cummings Publishing Co., Inc., Redwood City, CA.

Burnstein, I., Tubaishat, A., Mirza, A., Roberson, K., and Sanchez, A., 1995. Knowledge Engineering for Automated Software Fault Localization, Midwest AI and Cognitive Science Soc. Conf. '95, Carbondale, IL., April 1995, 22-26.

Burnstein, I., Tubaishat, A., and Mirza, A., 1993. Chunking Concepts for Model and Tool Building in the Software Fault Localization Domain. Technical Report, Dept. of CS, IIT, Chicago, IL.

Burnstein, I., and Mannina, S., 1992. Developing a Conceptual Model for the Software fault Localization Process. Proc. Fourth Midwest AI and Cognitive Science Soc. Conf., Utica, IL. 21-25.

Burnstein, I., Chang, C., and Tseng, C., 1991. Modeling the Fault Localization Process with A Blackboard-Based Fault Localization Tool. Proc. Third Midwest AI and Cognitive Science Soc. Conf., IL., 112-116.

Hartman, J., 1991. Understanding Natural Programs Using Proper Decomposition. IEEE 13th International Conf. on Software Engineering, May 1991, Austin, TX, 62-73.

Harandi, M., and Ning, J., 1988. PAT: A Knowledge Based Program Analysis Tool. Proc. of the Conf. on Software Maintenance, Scottsdale, AZ, Oct. 1988, Vol 7, No. 1, 312-318.

Lukey, F., 1980. Understanding and Debugging Programs. International Journal Man-Machine Studies, Vol. 12, No. 2, 189-202.

Mirza, A., 1995. Design of a Program Re-Engineering Tool Set based on Rules of Style and Discourse. Ph.D. Dissertation, Dept. of CS, Illinois Institute of Technology, May 1995.

Ourston, D. 1989. Program Recognition. IEEE Expert, Winter 1989, Vol. 4,No. 4, 36-49.

Rich C., and Waters, R., 1990. The Programmer's Apprentice. Addison Wesley Publishing Co., Reading, MA.

Soloway, E., and Ehrlich, K., 1984. Emperical Studies of Programming Knowledge. IEEE Transactions on Software Engineering, Sept. 1984, Vol. SE-10, No 5, 595-609.

Tubaishat, A., 1994. A Design Centered Approach to Engineering a Programming Library for Automated Software Fault Localization. Ph.D. Thesis, Dept. of CS, IIT, Chicago, IL., Dec. 1994.