Standardizing the Querying Process with SGML The SQL DTD Arijit Sengupta Abstract One of the most exciting applications of SGML which has emerged in the recent years is its use in document databases. The structural information embedded in SGML documents makes it possible to query SGML documents and extract information in an automatic manner; however, this querying process has not been standardized. As a result, different SGML database implementations use their own query language syntax, thus making the migration from one system to another a difficult process. In the relational database domains, however, the query language SQL (Structured Query Language) has been a standard for over ten years and is universally used in most relational database systems. Although originally designed for relational databases, SQL is quite powerful for specifying complex queries in a relatively easy-to-understand syntax. With a small set of extensions to take advantage of the hierarchical structure of SGML, SQL can be easily adapted for use with SGML document databases. The powerful "generalized" nature of SGML makes it easy to implement SQL as an SGML DTD (Document Type Definition) , so that queries can be expressed as document instances of the SQL DTD. Current SGML authors and users can write queries expressed in this DTD without learning a different language or using a separate editor. Moreover, because of the portable nature of SGML, these queries can be used in any SGML database system and can be converted to regular SQL for use in a relational or Object-Relational/Object-Oriented database system, if necessary. Databases that support the SQL DTD can also store the queries without any extra effort, and subsequently query them for inferring optimization parameters. This paper presents a representative DTD for the SQL query language, with extensions for use with hierarchically structured documents. It also compares this language with languages proposed and implemented, including SDQL (Standard Document Query Language) - the query language in the DSSSL standard. This paper explains the advantages of using this language as a query language in document database systems and the necessity for standardizing the querying process in document databases. Finally, it discusses some implementation issues and complexity measures.