Database Extensions for Structured Documents

The goal for the current research is to develop a complete database system for structured documents having data definition, manipulation and querying capabilities similar to those in the relational world, while keeping the system completely within the SGML domain. Only structured documents tagged with SGML have been considered, in which detailed and complete information about the document structuring is included by means of the Document Type Definition (DTD). In this research, an attempt is being made for designing query languages adapted for structured documents, data structures and access methods that can efficiently implement these queries, and visual interfaces that enable users specify these queries intuitively. To show the applicability of this research, experiments are being performed on a wide range of documents.

A prototype system incorporating most of the backend engine and frontend query processing interface is currently in the process of implementation. This system builds indices on top of the original document, and lets the user interactively specify the query by navigating the structure of the document using a sample document template. In addition to a graphical query interface, this system is also being designed to support queries written in SQL (Structured Query Language) adapted to implement queries useful for SGML documents. In addition to the "Select-Project-Join" types of queries in SQL, the query language will have extensions that will let the user specify queries specific to structured documents.

Associated Faculty:
Dirk Van Gucht, Ed Robertson
Affiliated Projects:
LETRS (Library Electronic Text Resource Services): This project owes a lot from this joint venture of the Indiana University Main Library and the University Computing Services. The primary research platform and a rich set of data is provided by LETRS.
Currently the research is being sponsored by a GAANN (Graduate Assistance in Areas of National Need) fellowship by the Indiana University and the U.S. Department of Education.

