For this workshop, we will be writing both a scanner and a parser directly in Scheme rather than using scanner/parser-generators such as lex or yacc to do the ``dirty'' work.
If we are to write our own scanner, we need to deal with a sticky issue: input. A scanner should take as its input a ``stream'' of characters. This could be simulated by passing in a string, we will instead use Scheme's input port mechanism.
We will use the following procedures for our input:
(read-char input-port)
--> character-or-eof read-char produces the next character. If no
character is available, read-char produces the
end-of-file object. (peek-char input-port)
--> character-or-eof read-char does, but does not ``consume'' it, so
the next call to peek-char or
read-char will return the same character. (eof-object? character-or-eof)
--> boolean (open-input-file string)
--> input-port (open-input-string string)
--> input-port read-char on the port will return the characters
of string, in order. When the characters in
string are exhausted, invocations of
read-char and peek-char on the port
will return the end-of-file object. > (define ip (open-input-string "hi")) > (peek-char ip) #\h > (peek-char ip) #\h > (read-char ip) #\h > (peek-char ip) #\i > (read-char ip) #\i > (peek-char ip) #!eof > (eof-object? (peek-char ip)) #t > (read-char ip) #!eof > (read-char ip) #!eof >
As an example, we will implement a scanner and parser for lists of numbers. The scanner accepts a character stream (an input port) according to the grammar, and returns token records:
(lparen) -- a left-parenthesis token (rparen) -- a right-parenthesis token (datum number) --
a datum token. Note that number should be a normal,
everyday Scheme number (which will display in decimal form,
though that's Scheme's doing, not yours).(eof) -- an end-of-file token (returned when there
is nothing to scan).Scanning and parsing Scheme is a bit of an extension of the list of numbers scanner and parser. The scanner is quite a bit more complicated: it accepts strings based on a slightly simplified Scheme grammar. The parser for Scheme is almost as easy as that for lists of numbers, however. It is based on Scheme's datum grammar, and should return a Scheme datum.