Datum grammar acceptors

The grammar form, obtained by loading the file grammar.ss, contains a context-free grammar and returns an acceptor for the grammar. The grammar is expressed in "parenthesized, prefix, extended BNF". The returned acceptor is a predicate that takes a datum and returns a boolean indicating if the datum can be generated by the given grammar. Expressing any Scheme-like syntax is straightforward, and a hook for introducing non-terminals defined using arbitrary Scheme code allows any grammar that defines a subset of Scheme's datum syntax to be handled.

An extended-BNF grammar for the grammar form follows:

<grammar expression> --> ( grammar <start> <production>* ) <start> --> <variable> <production> --> ( <variable> <element>* ) <variable> --> <symbol> <element> --> <terminal> | <non-terminal> <terminal> --> '<datum> <non-terminal> --> <variable> | ( alt <element>* ) | ( seq <element>* ) | ( lst <element>* ) | ( star <element> ) | ( plus <element> ) | ( opt <element> ) | ( dot <element> <element> ) | ( predicate <expression> ) | ( cfa <expression> ) | ( report-if-bad <datum> <non-terminal> ) <expression> --> any Scheme expression

The <start> variable must be one of the production variables, and indicates the starting non-terminal of the grammar.

alt indicates alternation,
seq sequencing,
lst descent into and sequencing through a sub-list,
star zero or more (Kleene's star),
plus one or more,
opt optional (zero or one), and
dot a dotted pair, where the first element of dot recognizes a list of the car elements of an improper list, while the second element recognizes the trailing cdr element.

In predicate, the expression shall eveluate to a predicate that is passed the car of the list being accepted and returns a boolean indicating whether it should be accepted.

In cfa, the expression shall evaluate to a procedure that takes the list being accepted and returns #f if it is to be rejected, or a tail of the list if the corresponding prefix of the list is accepted.

If the non-terminal of

(report-if-bad datum non-terminal)
fails on input, a message of the form "Bad datum: input" is printed with a maximum print depth of two. Also, from the time when the first such message is generated until the grammar expression returns, alts fail immediately. This keeps other alt elements from being tried when it is known that the acceptor fails. Control does return through dynamically enclosing report-if-bad elements, allowing the location of the original error to be traced.

An acceptor for the grammar form may be obtained with:

(grammar grammar-expression
  (datum (predicate (lambda (x) #t)))
  (expression datum)
  (grammar-expression
    (report-if-bad 'grammar-expression
      (lst 'grammar start (plus production))))
  (start variable)
  (variable (predicate symbol?))
  (production (report-if-bad 'production (lst variable (star element))))
  (element (alt terminal non-terminal))
  (terminal (lst 'quote datum))
  (non-terminal
    (report-if-bad 'non-terminal
      (alt variable
        (lst 'alt (star element))
        (lst 'seq (star element))
        (lst 'lst (star element))
        (lst 'star element)
        (lst 'plus element)
        (lst 'opt element)
        (lst 'dot element element)
        (lst 'predicate expression)
        (lst 'cfa expression)
        (lst 'report-if-bad datum non-terminal)))))

For a more substantial example, study the acceptor for the context-free grammar of Scheme transliterated from the R4RS formal syntax. Of particular interest are the way procedure calls are defined to obtain suitable Bad expression messages, and the way dot is used in defining <formals>.

Chris Haynes / chaynes@indiana.edu