number --> sign digit+Handle all other Scheme syntax that could effect read. Refer to the relevant parts of R4RS, especially section 2, subsections 7.1.1 and 7.1.2, and the description of the read procedure.
For input use only the Scheme procedures read-char and peek-char. Symbols may be created using the Scheme procedure string->symbol. (A compiler's reader generally does the work of this procedure, but we will deal with it later.)
Use a DFA as the basis of a table-driven scanner that returns tokens that are consumed by a recursive-descent parser that returns the next datum.
Represent target ports (those used by your read procedure, as opposed to the Scheme read procedure) as Chez Scheme structures defined by
(define-structure (tport hport pos))where the hport field contains a "host" Scheme input port and the pos field contains a position count of the number of characters your read procedure has read from the port.
Signal syntax errors by invoking the Chez error procedure with a call of the form
(error 'read "description at character ~s" n)where description is as appropriate to the nature of the error as is reasonable within the context of your reader design and n is the input port position number.
Submit your project by an email message to sakumar@copper with subject code1, as with the command
mail -s "code1" sakumar@copper <file_nameIn addition submit to Sanjeev, in class or in his mail box, a hand-drawn transition diagram for your scanner.
Your code need not contain documentation that repeats information included with this assignment. Any differences in approach should be clearly described using comments in the code.
Make-scanner takes a table describing the lexical structure of the language and returns a driver that takes a target input port and returns the next token from that port. The table is a two-dimensional table (a vector of vectors). Each row corresponds to a DFA state, and each column corresponds to a character in the ASCII character set. An entry in the table for row S and column C is a triple of the form
<buffer?, next-state, action>describing what to do when character C is read from the input stream while in state S. Buffer? is a boolean describing whether or not the scanner should buffer the input character. Next-state is the number of the state to which the scanner should proceed or one of the two symbols final or final-preserve. Token-maker is a procedure that is applied to the buffer (a list of characters, last read first) when the scanner enters a final state.
There are two kinds of final states corresponding to the next state specifiers final and final-preserve. Normally, the character on the input stream is taken off of the input stream when the scanner moves to the next state. This behavior is no different when next-state is final. However, the character is not taken off of the input stream when next-state is final-preserve. Final should be used when the current character is or helps form the current token (as with one-character tokens, such as a parenthesis). Final-preserve should be used when the current character is not part of the current token, as in the right parenthesis that terminates the symbol x on the input stream "x)".
The first state in the table is the start state. The declare-table syntactic extension converts a description of a DFA into code that builds a table suitable for the driver above. Its form is
(declare-table state-descriptor ...)where each state-descriptor describes one state of the DFA. State descriptors are of the form
(state-name transition-specifier ...)where state-name is the (symbolic) name of the state. Each transition-specifier takes one of the forms:
(charspec buffer? next-state)where buffer?, next-state, and token-maker are as described for the driver above. A charspec is a list of characters and character ranges. A character is specified using the usual Scheme syntax for characters, such as #\a. A range is a character, followed by a hyphen, followed by another character, for example #\0 - #\9. A range matches any character in the range (inclusive), while character matches only itself. Since there is no character that denotes end-of-file, the symbol eof is used. It may appear as a character in the list, but not as a range bound.
(charspec buffer? final token-maker)
(charspec buffer? final-preserve token-maker)
The keyword else may be used in place of charspec in the last transition specifier for a state. The else transition is used for any character not otherwise accounted for by the other charspecs. If no else clause is present, any characters not otherwise accounted for are given a default final transition with a token maker that signals an unexpected character error.
Charspecs should not overlap within a state descriptor. For example, one charspec should not contain the range #\a - #\z if another charspec contains the single character #\q.
Submit your test cases by two email messages to sakumar@copper, with subjects accept1 and reject1, including cases that the reader should accept and reject, respectively.
Shortly after the test case submission dealine, all submitted test cases will be accessible in the file test-cases.ss (in /u/chaynes/c431, as usual). With your read procedure and the file test1.ss loaded, execute (test1) to test your procedure against the a nominally correct solution that test1.ss loads as the procedure std-read.