DocBase is the successor of SGMLQuery, and contains all features of SGMLQuery and includes SQL support

DocBase - A Document Database System

DocBase started as a research project in the Department of Computer Science at Indiana University, Bloomington. This research was part of my dissertation, under the guidance of Prof. Dirk Van Gucht, Prof. Edward Robertson, Prof. Andrew Dillon and Prof. David Leake.

In its current implementation, DocBase acts primarily as a query processing system for structured documents. Right now, DocBase supports SGML (Standard Generalized Query Language - ISO 8879), and XML with DTDs. DTDless XML support is planned in a future release.

Development team

DocBase never really had a big development team. After I started the initial implementation in January 1996, I had a few students work on parts of the code and make very strong contributions. I am very thankful to these students for their help with the project.

Development history

  1. Fall 1995: first conception of the system - as a means for setting up access to the Chadwyck Healey English Poetry Database by the Bloomington community.
  2. December 1995: Implementation of the form interface for the poetry database access, and the backend processing system
  3. January 1996: Start of implementation of the Java query interface as an alternative to form-based querying.
  4. May 1996: First version of SGMLQueryfinished, and usability analysis performed on the system.
  5. December 1996: Formalization of QBT based on the SGMLQuery idea, and generalization of the interface into a visual query language
  6. December 1996: Formalization of the query languages and processing ideas for DocBase, based on the processing engine for SGMLQuery
  7. August 1997: Initial implementation of the DocBase engine complete, testing and performance measures done
  8. December 1997: Dissertation completed, the thesis on DocBase published
  9. March 1998 - now: Ongoing work on improving the implementation, developing a public release with support for a free storage manager and indexing system.

Publications related to DocBase

Currently, there is one completed system demonstration available with DocBase, for the Chadwyck-Healey English Poetry Database. Because of copyright restrictions, the actual poems are not available for access outside Indiana University. To access the demonstration, please refer to the QBT page.

Source code of DocBase and QBT is currently not publicly available. However, after the initial release is completed (projected date: August 1999), we will have part of whole of the system available for download under the GNU public license. If you would like to be notified when a release is available, please contact me at

