SIOD: Scheme in One Defun

SIOD is a small-footprint implementation of the Scheme programming language that is provided with some database, unix programming and cgi scripting extensions.
This document is © 1996 by George J. Carrette, All Rights Reserved.
The most recent source code release is usually available in the directory ftp://ftp.std.com/pub/gjc/ as siod_tar.gz. Windows 95 binaries win95bin.zip. You can reach the the author at gjc@world.std.com.
$Id: siod.html,v 1.17 1996/07/15 14:35:12 gjc Exp $

Table of Contents

  • Apology, discussion and motivation.
  • Building from Sources.
  • Release Notes.
  • What is Scheme?
  • Reference Section for built-in procedures.
  • Reference Section for extension-provided procedures.
  • Command interfaces and some scripts provided.
  • Some scheme coded library modules.
  • Garbage Collection.
  • Porting, to tiny machines.
  • Writing extensions in the C programming language.
  • LIBSIOD use as an extension language for C programs.
  • References.
  • Apology, discussion and motivation

    The genesis of this interpreter is a question posed by students of a LISP and Artificial Intelligence Programming Techniques course that I had developed and was teaching at Boston University in the late 1980's:
    How can we possibly hope to make use any of this stuff in work we are doing in other courses or in our jobs?

    The problem being that both the commercial and non-commercial lisp offerings at the time seemed to want to take over your entire programming environment if you wanted to use lisp at all. A completely satisfactory solution to this problem will not be fully available again until a lisp system is written that uses the same compiler back-end, debugging and runtime support as all the other languages on a popular machine. But I digress.

    Knowing that the use of lisp didn't have to be so intrusive, but having very little practical evidence at hand to actually prove the fact, I decided one day while I was keeping the laboratory section room open for course EK201 to sit down and implement a demonstration made up of a simple cons, arithmetic, read, eval, and print in straightforward C programming style, where the garbage collector was stop-and-copy and could only run in the context of the toplevel loop. Defining and executing fib resulted in a code-coverage of over 95% of the lines of the C program. Hence, SIOD, Scheme in One Day. Borrowing the name of a previous Scheme interpreter I had done to test the Lispmachine microcode compiler, Scheme in One Defun.

    The motivation behind SIOD remains a small footprint, in every sense of the word, at runtime, at compile time, and in cognitive attention required to understand how the system works enough to be able to extend it as well as the author would have done the work himself.

    About eight years have passed since that initial release. It has been possible to add a feature or two without contributing to the cause of software bloat, with the code segment of the libsiod shared library remaining under 75K bytes on a prototypical comparison machine like a VAX. Furthermore, as the richness of the C runtime library available on most systems has improved over time, SIOD remains a useful kind of glue to have in a software engineers toolbox.

    Please forgive the lack of full compliance with IEEE or R4RS standards. Perhaps one of these days.

    Building from Sources

    See the README file that comes with the distribution. The following systems are known to be supported using supplied build files.
    MakeOS
    Digital Equipment Corporation DIGITAL UNIX (OSF/1)
    Linux Linux
    Hewlett-Packard Company HP-UX
    Sun Microsystems Solaris
    Silicon Graphics IRIX
    Digital Equipment Corporation OpenVMS
    Microsoft Windows 95

    Release Notes

    What is Scheme?

    Scheme is a programming language, a dialect of the Lisp (List Processing) family of languages, generally utilizing a syntax based on parenthetical expressions delimited by whitespace, although alternative syntax capabilities are sometimes available. An implementation such as SIOD usually runs by default in an immediate execution mode, where programs are parsed as they are entered, then executed with the results being printed. Hence this set of examples with input syntax suitable for either Microsoft QBASIC or SIOD:

    QBASIC.EXE input SIOD input SIOD Result
    PRINT (1 + 2) * (3 + 4)
    (* (+ 1 2) (+ 3 4))
    21
    PRINT "HELLO-" + "BUDDY"
    (string-append "HELLO-" "BUDDY")
    "HELLO-BUDDY"
    FUNCTION f(x)
    IF x < 2 THEN
    f = x
    ELSE
    f = f(x - 1) + f(x - 2)
    END IF
    END FUNCTION
    (define (f x)
      (if (< x 2)
          x
        (+ (f (- x 1)) (f (- x 2)))))
    #<CLOSURE>
    PRINT f(20)
    (f 20)
    6765

    Reference Section for built-in procedures

    Note that the arguments to built-in procedures are always optional and default to (). Many of these procedures call a C library function of the same or similar name, in the obvious way. Therefore you can refer to the unix manual page for more detailed information about function behavior. Such procedures are indicated with a bold U.

    (%%%memref address)

    This is a lowlevel routine which should not be invoked in normal code. References a byte of memory at address. Used mostly to cause a core dump for debugging purposes by referencing address 0 or -1.

    (%%closure env code)

    This is a lowlevel routine which should not be invoked in normal code. If code is a cons of the form (arlist . body) then env is a list of frames, and the application of the closure will invoke the interpreter. Otherwise code should be of type tc_subr_X and the application of the closure will pass the env as the first argument to the C procedure implementing the subr.

    (%%closure-code closure)

    This is a lowlevel routine which should not be invoked in normal code. Returns the code passed to %%closure.

    (%%closure-env closure)

    This is a lowlevel routine which should not be invoked in normal code. Returns the env passed to %%closure.

    (%%stack-limit amount silent)

    If amount is non-null it sets the runtime stack check pointer to allow for that number of bytes. If silent is non-null the resulting (or current) stack size is returned, otherwise a message is printed.

    (* x1 x2 ...)

    Returns the product of all its arguments, or 1 if no arguments.

    *after-gc*

    A variable, the value is an express evaluated after the gc has done its work. For example:
    (set! *after-gc* '(if (< (gc-info 4) 5000) (allocate-heap)))

    *args*

    A variable, bound to the list of arguments passed to the main program siod.

    (*catch tag body ...)

    A special form. Tag is evaluated and kept in a special location while all the forms in the body are evaluated. Normally returns the value of the last form except if a *throw is encountered within the dynamic scope of the evaluations. Errors may be caught by using a tag of 'errorobj.

    *env*

    A variable, bound to the list of environment values passed to the main program siod.

    *eval-history-ptr*

    A variable, default (), but if set to a list (possibly circular) then each call to eval will cause the car of the list to receive a pointer to the form being evaluated, and then the variable will be set to the cdr of the list. Useful for writing a retrospective trace debugging capability.

    *pi*

    A variable, value 3.1416.

    *plists*

    A variable, internal to the implementation of get and putprop.

    (*throw tag value)

    Locates an active *catch for which the tag is identical and then forces the *catch form to return the value.

    *traced*

    A variable, value is a list of procedures that have been traced.

    (+ x1 x2 ...)

    Returns the sum of its arguments.

    (- x1 x2 ...)

    With one argument returns the negation, returns the difference of the first argument and the sum of the rest.

    (/ x1 x2 ...)

    With one argument returns the inverse, otherwise returns the quotiont of the first argument and the product of the rest.

    (< x y)

    Returns true if x is numerically less than y.

    (<= x y)

    Returns true if x is numerically less than or equal to y.

    (= x y)

    Returns true if x is numerically equal to y.

    (> x y)

    Returns true if x is numerically greater than y.

    (>= x y)

    Returns true if x is numerically greater than or equal to y.

    (F_GETLK fd ltype whence start len)

    The fd may be an integer or file. The function fcntl (U) is called on the file descriptor and an appropriate struct flock constructed from the ltype, whence, start and len arguments, and the lock operation F_GETLK. The ltype may be F_RDLCK,F_UNLCK, or F_WRLCK. Whence may be SEEK_CUR, SEEK_END or SEEK_SET.

    (F_SETLK fd ltype whence start len)

    Same as F_GETLCK but with lock operation F_SETLK. U.

    F_SETLKW fd ltype whence start len)

    Same as F_GETLCK but with lock operation F_SETLKW. U. For a good example see the command script cp-build.

    (abs x)

    Returns the absolute numerical value of x.

    (access-problem? filename method)

    Invokes the access function (U) on the filename and flags created from the method string which should contain one or more of the characters "rwxf" returning non-null if there is a problem with accessing the file in that way. For example:
    (if (access-problem? "x.y" "r") (error "can't read x.y"))

    (acos x)

    Returns the inverse cosine of x.

    (alarm seconds flag)

    Invokes the alarm function (U). The handling of which will causes an error to be signaled in so many seconds. But if flag is false then the error will not be signaled if the alarm took place inside a system call or other critical code section.

    (allocate-heap)

    Attempts to allocate (call the C library malloc procedure) to obtain an additional heap. The size of the heap and the maximum number of heaps are determined at startup time. Returns non-null if successful.

    (and form1 form2 form3 ...)

    A special form which causes the evaluation of its subforms in order, from left to right, continuing if and only if the subform returns a non-null value.

    (append l1 l2 l3 l4 ...)

    Returns a list which the result of appending all of its arguments. Example:
    (append '(a b) '(c d)) => (a b c d)

    (apply function arglist)

    Applies the function to the argument list arglist.

    (apropos substring)

    Returns a list of all symbols containing the given substring.

    (aref array index)

    Returns the element of the array at the given index.

    (array->hexstr string)

    Takes a string or byte array and returns a string in representing the values of the elements in hex.

    (aset array index value)

    Stores the value at the given index in the array.

    (ash value bits)

    Arithmetic shift of value a given number of bits to the left (positive) or right (negative).

    (asin x)

    Returns the inverse sin of x.

    (ass key alist function)

    Returns the first element of the alist such that the function applied to car of the element and the key returns a non-null value. For example:
    (define (assq x alist) (ass x alist eq?))

    (assoc key alist)

    Same as (ass key alist equal?).

    (assq key alist)

    Same as (ass key alist eq?).

    (assv key alist)

    Same as (ass key alist eql?).

    (atan x)

    Returns the inverse tagent of x.

    (atan2 x y)

    Returns the inverse tagent of x/y.

    (base64decode x)

    Given a string X in base64 representation returns a string with bytes computed using the base64 decoding algorithm. See rfc1521.txt.

    (base64encode x)

    Returns a string computed using the base64 encoding algorithm.

    (begin form1 form2 ...)

    A special form which evaluates each of its subforms one after another, returning the value of the last subform.

    (benchmark-eval nloops exp env)

    A zero-overhead way of evaluating the exp n times.

    (benchmark-funcall1 nloops f arg1)

    A zero-overhead way of calling the function f n times on arg1.

    (benchmark-funcall2 nloops f arg1 arg2)

    A zero-overhead way of calling the function f n times on arg1 and arg2.

    (bit-and x y)

    Returns the bitwise logical "and" (C language & operator) of numerical arguments x and y.

    (bit-not x)

    Returns the bitwise logical complement (C language ~ operator) of numerical argument x.

    (bit-or x y)

    Returns the bitwise logical "or" (C language | operator) of numerical arguments x and y.

    (bit-xor x y)

    Returns the bitwise logical "xor" (C language ^ operator) of numerical arguments x and y.

    (butlast x)

    Returns a new list which has all the elements of the argument x except for the last element.

    (bytes-append x1 x2 ...)

    Returns a new byte array by appending its arguments which may be strings or byte arrays.

    (caaar x)

    Same as (car (car (car x))).

    (caadr x)

    Same as (car (car (cdr x))).

    (caar x)

    Same as (car (car x)).

    (cadar x)

    Same as (car (cdr (car x))).

    (caddr x)

    Same as (car (cdr (cdr x))).

    (cadr x)

    Same as (car (cdr x)).

    (car x)

    If x is the result of (cons a b) then (car x) is the same as a.

    (cdaar x)

    Same as (cdr (car (car x))).

    (cdadr x)

    Same as (cdr (car (cdr x))).

    (cdar x)

    Same as (cdr (car x)).

    (cddar x)

    Same as (cdr (cdr (car x))).

    (cdddr x)

    Same as (cdr (cdr (cdr x))).

    (cddr x)

    Same as (cdr (cdr x)).

    (cdr x)

    If x is the result of (cons a b) then (cdr x) is the same as b.

    (chdir path)

    Changes default directory to path. U.

    (chmod path mode)

    Changes the file mode of path. U. For example, to add execute access permission to the file f:
    (chmod f
           (encode-file-mode (append '(XUSR XGRP XOTH)
                                      (cdr (assq 'mode (stat f))))))

    (chown path uid gid)

    Changes file ownership. U.

    (closedir stream)

    Closes a directory stream. U.

    (cond clause1 clause2 ...)

    A special form where each clause is processed until the predicate expression of the clause evaluates true. Then each subform in the predicate is evaluated with the value of the last one becoming the value of the cond form:
    (predicate-expression form1 form2 ...)

    (cons x y)

    Allocates a list object with x as the car and y as the cdr. For example:
    (cons 1 (cons 2 (cons 3 ())))
    evaluates to
    (1 2 3)

    (cons-array dimension kind)

    Allocates an array (currently limited to one dimension). The kind may be string, byte, double, or lisp (default).

    (copy-list x)

    The toplevel cons objects of x are copied, returning a new list.

    (cos x)

    Returns the cosine where x is in units of radians.

    (cpu-usage-limits soft-limit hard-limit)

    Invokes getrlimit if the arguments are null or otherwise setrlimit. U.

    (crypt key salt)

    A form of string hash. U.

    (current-resource-usage kind)

    Kind is the symbol SELF or CHILDREN, calls getrusage, U.

    (datlength data ctype)

    Returns the dimension of the data as if viewed as an array by the datref function.

    (datref data ctype index)

    References the data as if it were an array of C data type ctype, at the given index. The ctype may be CTYPE_CHAR, CTYPE_DOUBLE, CTYPE_FLOAT, CTYPE_LONG, CTYPE_SHORT, CTYPE_UCHAR, CTYPE_ULONG, or CTYPE_USHORT. The data may be a string or byte array.

    (decode-file-mode x)

    Returns a list of symbols given a numerical file mode.

    (define subform1 subform2)

    A special form used to assign a value to a variable in one of two ways:
    (define variable value)
    or to create a procedure
    (define (procedure-name arg1 arg2 ...)
      form1
      form2
      ...)

    (delete-file path)

    Deletes the file specified by path.

    (delq element list)

    Deletes the elements of the list which are eq to its first argument. Possibly modifying the list using the set-cdr! operation.

    (encode-file-mode list)

    Takes a list of file mode symbols and returns the numerical value. SUID, SGID, RUSR, WUSR, XUSR, RGRP, WGRP, XGRP, ROTH, WOTH, XOTH.

    (encode-open-flags list)

    Takes a list of open (U) flag symbols and returns a numerical value. NONBLOCK, APPEND, RDONLY, WRONLY, RDWR, CREAT, TRUNC, EXCL.

    (endpwent)

    See U.

    (env-lookup indentifier environment)

    Returns an object such that the car is the location where the value of identifier is stored.

    (eof-val)

    Returns the object returned by read upon encountering and end of file condition.

    (eq? x y)

    Returns true if x and y are the same object.

    (equal? x y)

    Returns true if x and y are equal objects.

    (eqv? x y)

    Returns true if x and y are the same object or numerically equal.

    errobj

    This variable is assigned to the offending object when the error procedure has been invoked. Useful mainly during interactive debugging.

    (error message object)

    Prints the error message then aborts the current execution by invoking *throw using the symbol errobj as the tag and the cons of the message and the object as the value. Equivalent to:
    (define (error message object)
      (if (> (verbose 0))
          (writes nil "ERROR: " message "\n"))
      (set! errobj object)
      (*throw 'errobj (cons message object)))

    (eval expression environment)

    Evaluates the expression in the context of the environment. This is not a special form. For example:
    (eval (read-from-string "(+ 1 2)"))
    evaluates to 3.

    (exec path args env)

    Calls execv or execve U.

    (exit status)

    Calls exit U.

    (exp x)

    Computes the exponential function of x.

    (fast-load path noeval-flag)

    Loads a file of binary format expressions, if noeval-flag is true returns a list of the forms instead of evaluating them.

    (fast-print object state)

    Outputs a fast (binary) format representation of object, where the state is a list of (file hash-array index).

    (fast-read state)

    Inputs a form which had been output in fast (binary) format.

    (fast-save filename forms nohash-flag comment-string)

    Creates a file by using fast-print to output each of the forms. A true value for the nohash-flag will cause symbol names to be output each time they are encountered. Otherwise a more optimal index representation is used. The comment-string is used as the first line of data in the file.

    (fchmod filedes mode)

    The filedes may be an number or an open file object. U.

    (fclose stream)

    Closes the open file stream. U.

    (fflush stream)

    See U.

    (file-times path)

    Returns a list of the st_ctime and the st_mtime returned by the stat function. U.

    (first x)

    Returns the first element (car) of the list x.

    (fmod x y)

    Floating point mod. U.

    (fnmatch pattern string flags)

    Returns true if the string matches the pattern. U.

    (fopen path mode)

    Opens the file and returns a file stream. U.

    (fork)

    Create a child process. Returning a numerical pid in the parent, () in the child, or call error if the child cannot be created. U.

    (fread size-or-buffer stream)

    Returns a new string of size bytes by calling fread U. Or uses the buffer (a string or a byte array) instead and returns the number of bytes read. Returns () on end-of-file.

    (fseek file offset direction)

    The direction is SEEK_CUR, SEEK_END or SEEK_SET. U.

    (fstat stream)

    Calls fstat U and returns an alist with elements dev, ino, mode, nlink, uid, gid, rdev, size, atime, mtime, ctime, blksize, blocks, flags, and gen.

    (ftell stream)

    Calls ftell U to return the current offset into a file.

    (fwrite data stream)

    Write the data, a string or byte-array to the stream. Or data can also be a list of a string or byte-array and a numerical length.

    (gc)

    Invokes the garbage collector.

    (gc-info item)

    ItemValue
    0true if copying gc, false if mark and sweek
    1number of active heaps
    2maximum number of heaps
    3number of objects per heap
    4amount of consing of objects before next gc

    (gc-status [flag])

    If flag is not specified prints information about the gc. Otherwise flag can be used to turn on or off gc messages or turn on or off the gc itself when in stop and copy mode.

    (get object key)

    Returns the key property of the object.

    (getc stream)

    Reads a character from the stream, returns () for end of file. U.

    (getcwd)

    Returns the current working directory. U.

    (getenv name)

    Returns the value of the environment variable named, or (). U.

    (getgid)

    Returns the group id of the process. U.

    (getgrgid gid)

    Returns a list of members of the specified numerical group. U.

    (getpass prompt)

    Prompts the user and reads a line with echoing turned off. U.

    (getpgrp)

    Returns the process group ID of the calling process. U.

    (getpid)

    Returns the process ID of the calling process. U.

    (getppid)

    Returns the parent process ID of the calling process. U.

    (getpwent)

    Returns an alist representing the next item in the /etc/passwd file. U.

    (getpwnam username)

    Returns the /etc/passwd file entry for the given username. U.

    (getpwuid)

    Returns the /etc/passwd file entry fo the given user id. U.

    (gets stream)

    Reads a line from the stream, () on end-of-file.

    (getuid)

    Returns the uid of the current process. U.

    (gmtime value)

    Decodes the value into an alist. The value defaults to the current time. U.

    (hexstr->bytes str)

    Decodes the hex representation into a byte array.

    (href table key)

    The hash table is a one dimensional array of association lists.
    (define (href table key)
      (cdr (assoc key
    	      (aref table (sxhash key (length table))))))

    (hset table key value)

    Stores the value into the hash table slot indicated by key.

    (html-encode str)

    If str contains any special html characters (<>&) a new string is returned with these replaced by their cooresponding representations &lt; &gt; &amp;.

    (if predicate-form true-form false-form)

    A special form that evaluates the true-form or the false-form depending on the result of evaluating the predicate form.

    (intern str)

    Looks up a string in the symbol table or enters a new symbol.

    (kill pid sig)

    Calls the kill function U. With sig defaulting to SIGKILL.

    (lambda (arg1 arg2 ...) form1 form2 ...)

    Returns an applicable procedure object (CLOSURE) with the given argument list and expression subforms. For example:
    (mapcar (lambda (x) (* x x)) '(1 2 3))
    evaluates to:
    (1 4 9)
    Also used by the define special form.

    (larg-default list index default-value)

    Reference the list according to index, but skipping over strings that begin with a colon or a dash. If the list is not long enough it returns the default-value instead. Most useful when used with the *args* variable inside a main program.

    (last list)

    Returns the last cons in a list.

    (last-c-error)

    Returns the value of the C library strerror(errno) U interned as a symbol.

    (lchown path owner group)

    Changes the ownership of a symbolic link U.

    (length object)

    Returns the length of an object which may be a string (acts like strlen) or a list, or an array.

    (let (binding1 binding2 ...) form1 form2 ...)

    A special form where each binding is a (variable value) pair. It works by computing the values, establishing the bindings, and then evaluating the forms, returning the value of the last one. For example the following evaluates to 30:
    (let ((x 10)
          (y 20))
      (+ x y))

    (let* (binding1 binding2 ...) form1 form2 ...)

    A special form where each binding is a (variable value) pair. It works by sequentially computing each value and then establishing a binding. For example the following evaluates to 30:
    (let* ((x 10)
          (y (+ x 10)))
      (+ x y))

    (letrec (binding1 binding2 ...) form1 form2 ...)

    Useful when the value forms of the bindings are lambda expressions with which you desire to program mutually recursive calls.

    (link existing-file entry-to-create)

    Creates a hard link U.

    (list item1 item2 ...)

    Conses up its arguments into a list.

    (lkey-default list index default-value)

    Returns the substring on the right hand size of the equal sign of the first element of the list of the form index=value, or the default-value if none are found. Useful when processing the *args* value inside a main program.

    (load fname noeval-flag search-flag)

    If search-flag is true it looks for fname in the current directory and then in the SIOD_LIB directory. The forms from the file are parsed according to the "parser:xxx" directive at the begining of the file (default "parser:read"). If the neval-flag is true then a list of the forms is returned otherwise the forms are evaluated.

    (load-so fname init_fcn)

    Loads the dynamic library specified by fname, invoking the init_fcn if specified (default init_fname).

    (localtime value)

    Returns an alist representing the value as a localtime. U. Value defaults to the current time.

    (log x)

    Computes the natural logarithm of x.

    (lref-default list index default-fcn)

    Returns the index element of the list or the result of calling the default-fcn if the list is not long enough.

    (lstat path)

    Returns the stat information of a logical link. U.

    (make-list length element)

    Creates a list of the given length filled with the element specified.

    (mapcar fcn list1 list2 ...)

    Returns a list which is the result of applying the fcn to the elements of each of the lists specified.

    (max x1 x2 ...)

    Returns the maximum of x1, x2, etc.

    (md5-final state)

    Returns a byte array computed from the state, derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm. See rfc1321.txt. Example:
    (define (md5 str)
      (let ((s (md5-init)))
        (md5-update s str)
        (array->hexstr (md5-final s))))

    (md5-init)

    Returns an md5 algoritm state as a byte array.

    (md5-update state string length)

    Performs the update step of the md5 algorithm using data from the string up to length, or length can be an open file object, in which case the data from the file is used to perform the update.

    (member key list)

    Returns the portion of the list where the car is equal to the key, or () if none found.

    (memq key list)

    Returns the portion of the list where the car is eq to the key, or () if none found.

    (memv key list)

    Returns the portion of the list where the car is eqv? to the key, or () if none found.

    (min x1 x2 ...)

    Returns the numerical minimum of its arguments.

    (mkdatref ctype ind)

    Creates a closure functionally equivalent to (lambda (x) (datref x ctype ind)).

    (mkdir path mode)

    Creates a directory with the specified numerical access mode. U.

    (mktime alist)

    Returns the numerical time value cooresponding to the alist in the same format as returned by the localtime function. U.

    (nconc l1 l2)

    Makes the cdr of the last cons of l1 point to l2.

    (nice increment)

    Changes the priority of the current process by the increment. U.

    nil

    Do not change the value of this variable which is bound to the empty list.

    (not x)

    Returns the reverse truth sense of x.

    (nreverse list)

    Destructive reversal of the elements of a list using set-cdr!.

    (nth index list)

    Reference the list using index, with the first element being index 0.

    (null? x)

    Returns true of x is the empty list.

    (number->string x base width precision)

    Formats the number according to the base, which may be 8, 10, 16 or the symbol e or f. The width and precision are both optional.

    (number? x)

    Returns true of x is a number.

    (opendir path)

    Returns a directory stream. U.

    (or form1 form2 ...)

    A special form which causes the evaluation of its subforms in order, from left to right until a form evaluates to a non-null value.

    (pair? x)

    Returns true if x is a pair (created by cons).

    (parse-number str)

    Convers a string to a number.

    (pclose stream)

    Used to close a stream opened using popen. Makes sure the associated process is killed. U.

    (popen command type)

    Executes the command in a child process and opens a stream connected to the process standard output if type is r, or the standard input if type is w. U.

    (pow x y)

    Computes the result of x raised to the y power.

    (prin1 object stream)

    Outputs the standard readable representation of the object to the stream, which defaults to the standard output.

    (print object stream)

    Same as prin1 followed by output of a newline.

    (print-to-string object string no-trunc-flag)

    Puts the readable representation of the object into the string, starting at the first character unless the no-trunc-flag is true, in which case the representation starts at the current length of the string.

    (prog1 form1 form2 form3 ...)

    A special form which evaluates all its subforms but returns the value of the first one. A useful shorthand to employ instead of using a let.

    (putc char stream)

    Outputs the character to the stream. U.

    (putenv setting)

    With a setting is of the form "key=value" makes a new environment binding available to the getenv function of the current and subsequent child processes, or updates an old one. U.

    (putprop object value key)

    Not implemented.

    (puts string stream)

    Outputs the string to the stream. U.

    (qsort list predicate-fcn access-fcn)

    Implements the recursive quicksort algorithm on elements of the list compared by using the predicate-fcn on the results of invoking the access-fcn.
    ExampleResult
    (qsort '(3 1 5 4 2) <)(1 2 3 4 5)
    (qsort '((3 a) (2 b)) < car)((2 b) (3 a))

    (quit)

    Cause the read-eval-print loop to return, usually resulting in an exit from the main program of siod, but may not when other C programs are utilizing the libsiod functionality.

    (quote x)

    A special form that returns x without evaluating it. Commonly written in abbreviated format as 'x.

    (rand modulus)

    Computes a random number from 0 to modulus-1. Uses C library rand.

    (random modulus)

    Computes a random number from 0 to modulus-1. Uses C library random.

    (read stream)

    Inputs characters from the stream returns the parsed standard expression, or (eof-val).

    (read-from-string string)

    Performs a read operation on the characters in the string.

    (readdir directory-stream)

    Returns the name of the next entry in the directory stream or () of none left.

    (readline stream)

    Reads a line of characters from the stream, returning () on end of file. The terminating newline is not included in the string, which is usually more convenient. For example, this procedure for loading a tab-delimited spreadsheet file:
    (define (load-spread-sheet filename)
      (if (>= (verbose) 2)
          (writes nil ";; loading spread sheet " filename "\n"))
      (let ((result nil)
    	(line nil)
    	(f (and (not (equal? filename "-")) (fopen filename "r"))))
        (while (set! line (readline f))
          (set! result (cons (strbreakup line "\t") result)))
        (and f (fclose f))
        (nreverse result)))

    (readlink path)

    Returns the contents of the symbolic link at path. U.

    (realtime)

    Returns a double precision floating point value representation of the current realtime number of seconds. Usually precise to about a thousandth of a second.

    (rename from-path to-path)

    Renames a directory or file within a file system. U.

    (require path)

    Computes a variable name by concatenating "*" + path + "-loaded*" and then calling (load path nil t) if and only if the variable is not bound to true. After the file is loaded the variable is set to true. This is the correct way of making sure a file is only loaded once.

    (require-so path)

    Computes a variable name by concatenating "init_" + path, and calling (load-so path) if and only if the variable is not bound to true. After the shared library has been loaded the variable is set to true. The correct way of making sure a shared library is only loaded once is:
    (require-so (so-ext 'name))

    (rest x)

    Returns the rest of the list x, in other words the cdr.

    (reverse x)

    Returns a new list which has elements in the reverse order of the list x.

    (rld-pathnames)

    Returns a list of the pathnames which represent shared libraries that have been loaded by the the current process.

    (rmdir path)

    Removes the directory entry specified by path. U.

    (runtime)

    Returns a list of containing the current cpu usage in seconds and the subset amount of cpu time that was spent performing garbage collection during the currently extant read-eval-print loop cycle.

    (save-forms filename forms how)

    Prints the forms to the file, where how can be "w" (default) or "a" to append to the file.

    (sdatref spec data)

    Used as the %%closure-code by mkdatref.

    (set! variable value)

    A special form that evaluates the value subform to get a value, and then assigns the variable to the value.

    (set-car! cons-cell value)

    Changes the car of the cons-cell object to point to the value.

    (set-cdr! cons-cell value)

    Changes the cdr of the cons-cell object to point to the value.

    (set-eval-history length circular-flag)

    Creates a list of the specified length and establishes bindings for *eval-history-ptr* and *eval-history*. The list is circular if the flag is specified true. Try the following:
    (define (fib x) (if (< x 2) x (+ (fib (- x 1)) (fib (- x 2)))))
    (set-eval-history 200)
    (fib 10)
    (mapcar (lambda (x) (if (pair? x) (car x) x)) *eval-history*)

    (set-symbol-value! symbol value env)

    Finds the location of the value cell for the specified symbol in the environment env and sets the value.

    (setprop obj key value)

    Not implemented.

    (setpwent)

    Resets the pointer into the /etc/passwd file. U.

    (sin x)

    Computes the sine function of the angle x in radians.

    (siod-lib)

    Return the setting of the siod library directory.

    (sleep n)

    Sleep for n seconds, where n may be fractional on some systems.

    (so-ext path)

    Append the path with the file extension for shared libraries.

    (sqrt x)

    Compute the square root of x.

    (srand seed)

    Reset the algorithm seed for the rand function. U.

    (srandom seed)

    Reset the algorithm seed for the random function. U.

    (stat path)

    Return an alist describing file status information, or () if the path cannot be accessed, (last-c-error) may be used to return the reason.

    (strbreakup string sep)

    Return a list of the portions of string indicated by the separator.
    (strbreakup "x=y&z=3" "&") => ("x=y" "z=3")

    (strcat str1 str2)

    Copies the string str2 into str1 starting at the current active end of str1, which is determined by the location of a 0 byte, calling error if there is not enough room left in str1. U.

    (strcmp str1 str2)

    Returns 0 if str1 and str2 are equal, or -1 if str1 is alphabetically less than str2 or 1 otherwise. U.

    (strcpy str1 str2)

    Copies str1 into str1 or calling error if there is not enough room. U.

    (strcspn str indicators)

    Returns the location of the first character in str which is found in the indicators set, returns the length of the string if none found. U.

    (strftime format-string alist)

    Uses the format-string to compute a string using broken-up time/data information from the alist (defaults to the current time), for example:
    (strftime "%B" '((mon . 3))) => "April"
    U.

    (string->number str radix)

    Converts the string to a number assuming the specified radix.

    (string-append str1 str2 str3 ...)

    Returns a new string which contains the concatenation of all its string arguments.

    (string-dimension str)

    Returns the maximum possible length of a string array.

    (string-downcase str)

    Return a new string converting all the characters of str to lowercase.

    (string-length str)

    Returns the active string length of str.

    (string-lessp str1 str2)

    Return true if str1 is alphabetically less than str2.

    (string-search key str)

    Locate the index of the key in the specified string. Returns () if not found.

    (string-trim str)

    Return a new string made by trimming whitespace from the left and right of the specified string.

    (string-trim-left str)

    Like string-trim but only the left hand side.

    (string-trim-right str)

    Like string-trim but only the right hand side.

    (string-upcase str)

    Returns a new string with all the lowercase characters converted to uppercase.

    (string? x)

    Returns true if x is a string.

    (strptime str format alist)

    Parses str according to format and merges the values with the alist.
    (cdr (assq 'mon (strptime "March" "%B"))) => 2
    U.

    (strspn str indicators)

    Returns the location of the first character in str which is not found in the indicators set, returns the length of the str if none found. For example:
    (define (string-trim-left x)
      (substring x (strspn x " \t")))
    U.

    (subset pred-fcn list)

    Return the subset of the list such that the elements satisify the pred-fcn. For example:
    (subset number? '(1 b 2 c)) => (1 2)

    (substring str start end)

    Returns a new string made up of the part of str begining at start and terminating at end. In other words, the new string has a length of end - start.

    (substring-equal? str str2 start end)

    An efficient way to determine if the substring of str2 specified by start and end is equal to str1.

    (swrite stream table form)

    This is the same as the write-smart-html procedure described in ftp://ftp.std.com/pub/gjc/www95-paper.html.

    (sxhash data modulus)

    Computes a recursive hash of the data with respect to the specified modulus.

    (symbol-bound? symbol env)

    Returns true of the symbol is bound in the environment.

    (symbol-value symbol env)

    Returns the value of the symbol in the environment.

    (symbol? x)

    Returns true if x is a symbol.

    (symbolconc arg1 arg2 ...)

    Slightly more efficient than calling intern on the result of using string-append on the arguments. This procedure actually predates the availability of the string data type in SIOD.

    (symlink contents-path link-path)

    Creates a directory entry link-path pointing to the contents-path. U.

    (system arg1 arg2 ...)

    Appends the string arguments to form a command to be executed by the operating system. U.

    t

    Please do not change the global value of this variable, bound to a true value.

    (tan x)

    Computes the tagent of the angle x specified in radians.

    (the-environment)

    A special form which returns the interpreter environment structure for the current lexical scope.

    (trace fcn1 fcn2 ...)

    Traces the specified interpreted procedures by modifying the closure objects.

    (trunc x)

    Returns the integer portion of x.

    (typeof x)

    Returns a symbol describing the type of the object x, or the integer type code.

    (unbreakupstr list sep)

    The reverse of strbreakup. The following example saves a list of lists as a tab delimited spreadsheet:
    (define (save-spread-sheet filename data)
      (if (>= (verbose) 2)
          (writes nil ";; saving spread sheet " filename "\n"))
      (let ((result data)
    	(f (and (not (equal? filename "-")) (fopen filename "w"))))
        (while result
          (writes f (unbreakupstr (car result) "\t") "\n")
          (set! result (cdr result)))
        (and f (fclose f))))

    (ungetc char stream)

    Puts the char back into the stream for the next call to getc.

    (unix-ctime x)

    Converts the integer time x into a string. U

    (unix-time)

    Returns the current number of seconds since 1-JAN-1970 GMT. U

    (unix-time->strtime x)

    Returns a string of the form "YYYYMMDDHHmmSSdd" which is useful in some contexts. This predates the availability of the strftime procedure.

    (unlink path)

    Deletes the specified entry from the directory stucture. U

    (untrace fcn1 fcn2 ...)

    Untraces the specified procedures.

    (url-decode str)

    Performs the url decode operation on the str.

    (url-encode str)

    Locates characters in the str which should not appear in a url, and returns a new string where they have been converted to the %NN hex representation. Spaces are converted to "+" signs.

    (utime path modification-time access-time)

    Sets the file modification and access times. U

    (verbose arg)

    Sets the verbosity level of SIOD to the specified level or returns the current level if not specified.
    Verbose LevelEffect on System
    0No messages.
    1Error messages only.
    2Startup messages, prompts, and evaluation timing.
    3File loading and saving messages.
    4 (default)Garbage collection messages.
    5display of data loaded from files and fetched from databases.

    (wait pid options)

    Waits on a child process by calling the waitpid function, where options may be a list containing (WCONTINUED WNOWAIT WNOHANG WUNTRACED). Returns a list of the process pid and final exit status. The fork-test.scm and http-stress.scm modules provide example usage. U

    (while pred-form form1 form2 ...)

    If pred-form evaluates true it will evaluate all the other forms and then loop.

    (writes stream data1 data2 data3 ...)

    Outputs the data arguments to the stream without quoting strings or special characters.

    Reference Section for extension-provided procedures

    Extension: acct

    Interface to unix accounting records.

    (decode_acct string)

    Built-in subroutine of 1 argument. Takes a string or byte array of length SIZEOF_ACCT and returns an alist. You would get this data by using fread on an appropriate accounting file. Currently an OSF/1 specific implementation.

    (decode_tacct string)

    built-in subroutine of 1 argument. Takes a string of SIZEOF_TACCT. Currently an OSF/1 specific implementation.

    UTMP_FILE

    variable, value data type tc_string

    WTMP_FILE

    variable, value data type tc_string

    (endutent)

    built-in subroutine of 0 arguments. U

    (getutent)

    built-in subroutine of 0 arguments. U

    (setutent)

    built-in subroutine of 0 arguments. U

    (utmpname path)

    built-in subroutine of 1 argument. U

    Extension: gd

    Interface to libgd.a, for producing GIF data. See http://siva.cshl.org/gd/gd.html. The scheme functions all follow the C functions in name and call interface. U. The piechart.scm module is an example usage.

    gdBrushed

    variable, value data type tc_flonum

    gdFont.h

    built-in subroutine of 1 argument. Returns height of a font.

    gdFont.w

    built-in subroutine of 1 argument. Returns width of a font.

    gdFontGiant

    variable, contains a font.

    gdFontLarge

    variable, contains a font.

    gdFontMediumBold

    variable, contains a font.

    gdFontSmall

    variable, contains a font.

    gdFontTiny

    variable, contains a font.

    gdImageArc

    built-in subroutine 0 or more arguments.

    gdImageChar

    built-in subroutine 0 or more arguments.

    gdImageCharUp

    built-in subroutine 0 or more arguments.

    gdImageColorAllocate

    built-in subroutine 0 or more arguments.

    gdImageColorClosest

    built-in subroutine 0 or more arguments.

    gdImageColorExact

    built-in subroutine 0 or more arguments.

    gdImageColorTransparent

    built-in subroutine of 2 arguments.

    gdImageCreate

    built-in subroutine of 2 arguments.

    gdImageCreateFromGif

    built-in subroutine of 1 argument.

    gdImageCreateFromXbm

    built-in subroutine of 1 argument.

    gdImageFill

    built-in subroutine 0 or more arguments.

    gdImageFillToBorder

    built-in subroutine 0 or more arguments.

    gdImageFilledPolygon

    built-in subroutine of 3 arguments.

    gdImageFilledRectangle

    built-in subroutine 0 or more arguments.

    gdImageGif

    built-in subroutine of 2 arguments.

    gdImageGifmem

    built-in subroutine of 2 arguments.

    gdImageInterlace

    built-in subroutine of 2 arguments.

    gdImageLine

    built-in subroutine 0 or more arguments.

    gdImagePolygon

    built-in subroutine of 3 arguments.

    gdImageRectangle

    built-in subroutine 0 or more arguments.

    gdImageSetPixel

    built-in subroutine 0 or more arguments.

    gdImageString

    built-in subroutine 0 or more arguments.

    gdImageStringUp

    built-in subroutine 0 or more arguments.

    gdPoint

    built-in subroutine 0 or more arguments.

    gdPoint.x

    built-in subroutine of 3 arguments.

    gdPoint.y

    built-in subroutine of 3 arguments.

    gdStyled

    variable, value data type tc_flonum

    gdStyledBrushed

    variable, value data type tc_flonum

    gdTiled

    variable, value data type tc_flonum

    gdTransparent

    variable, value data type tc_flonum

    Extension: ndbm

    These routines take string or byte array arguments, and return byte arrays. The lisp functions substring and datref can be useful to extract information from a byte array. U.

    DBLKSIZ

    variable, value data type tc_flonum

    DBM_INSERT

    variable, value data type tc_flonum

    DBM_REPLACE

    variable, value data type tc_flonum

    PBLKSIZ

    variable, value data type tc_flonum

    (dbm_close handle)

    Closes the files associated with the database handle.

    (dbm_delete handle key)

    Deletes the key from the index of the database.

    (dbm_dirfno handle)

    Returns a file descriptor for useful for passing to the F_SETLKW function.

    (dbm_error handle)

    built-in subroutine of 1 argument.

    (dbm_fetch handle key)

    built-in subroutine of 2 arguments.

    (dbm_firstkey handle)

    Returns the first key in the index.

    (dbm_nextkey handle)

    Returns the next key in the index.

    (dbm_open path open-flags mode)

    Returns a handle to the database, opening and creating the .dir and .pag files if needed. Use encode-open-flags to compute the open-flags argument and encode-file-mode to compute the mode argument. The file access mode argument is used only if the file needs to be created for the first time.

    (dbm_pagfno handle)

    Returns a file descriptor for useful for passing to the F_SETLKW function.

    (dbm_rdonly handle)

    Returns true if the database is read only.

    (dbm_store handle key data flag)

    Stores the data index by key into the database. The flag argument may be DBM_INSERT or DBM_REPLACE.

    Extension: parser_pratt

    Alternative syntax interface. Just the tokenizer is included here. The parser is provided by parser_pratt.scm and pratt.scm.

    (pratt_read_token buffer token-table stream)

    built-in subroutine of 3 arguments.

    Example, hello.scm

    #!/usr/local/bin/siod -v01,-m2 -*-mode:text;parser:pratt-*-
    main() := 
    {writes(nil,"Hello Scheme World.\n");
     fflush(nil);
     writes(nil,"fib(20) = ",fib(20),"\n");
    }
    $
    
    fib(x) := if x < 2 then x else fib(x-1) + fib(x-2)
    $

    Extension: regex

    Regular expression interface. U. See smtp.scm for example usage.

    (regcomp pattern flags)

    Returns a handle to a regular expression compiled from the pattern and flags: REG_EXTENDED, REG_ICASE, REG_NEWLINE, REG_NOSUB.

    (regerror code handle)

    Returns a string describing the error code argumented by error information from the compiled regular expression handle.

    (regexec handle string flags)

    If the string matches then a list of (start . end) pairs are returned indicating the substrings involved in the match against the pattern of the compiled regular expression. Otherwise an error code is returned: REG_BADBR, REG_BADPAT, REG_BADRPT, REG_EBRACE, REG_EBRACK, REG_ECHAR, REG_ECOLLATE, REG_ECTYPE, REG_EESCAPE, REG_EPAREN, REG_ERANGE, REG_ESPACE, REG_ESUBREG, REG_NOMATCH, REG_NOTBOL, REG_NOTEOL.

    Extension: sql_oracle

    Uses the OCI, Oracle Call Interface. See the sql_oracle.scm module for examples.

    Extension: sql_rdb

    Uses the RDB SQL SERVICES, sqlsrv_xxxx routines. See the sql_rdb.scm module for examples.

    Extension: sql_sybase

    Uses the Sybase Client ct_xxxx routines. See the sql_sybase.scm module for examples. All these procedures may print error and warning messages depending on the current verbose level, and also assign the variable *sybase-message* with similar information.

    (sybase-close [handle])

    Closes the connection to the database handle which defaults to the value of the variable *sybase*.

    (sybase-execute [handle] string cmd-type key1 arg1 key2 arg2 ...)

    Passes the string to the SQL server to be processed. The cmd-type is one of the sybase command types, such as CS_RPC_CMD or CS_LANG_CMD. The optional keys are procedure argument names. Pass "" as the key for un-named positional arguments. The result of sybase-execute is an list where the first element represents the status and the rest is an association list with CS_ROW_RESULT, CS_STATUS_RESULT, CS_PARAM_RESULT, and CS_MSG_RESULT elements. Rows are returned as lists of arrays. The handle defaults to the value of the variable *sybase*.

    (sybase-open key1 arg1 key2 arg2 ...)

    Opens a connection to the database specified by the current settings of the environment variables SYBASE and DSQUERY. The possible argument keys are username, password, and appname. A handle to the database connection structures is returned, and also set as the value of the variable *sybase* if it is currently ().

    (sybase-status [handle])

    Returns an list with information about the connection status, servername, and hostname. Handle defaults to the setting of *sybase*.

    Extension: ss

    Provides a simplified, stream-oriented interface to socket functionality. U. Example usages in smtp.scm and http-server.scm. This doesn't provide all the functionality and options that a straight socket interface would provide, but it has proven to be highly convenient for most client and server applications.

    (get-protocol-name number)

    Calls getprotobynumber, returns a list of names.

    (get-service-name port-number)

    Calls getservbyport. Returns a list of the protocol and the service names.

    (gethostbyaddr x)

    Calls gethostbyaddr, operating on the numerical address x.

    (gethostbyname name)

    Calls gethostbyname returning a list where the first element is the canonical host name and the rest is an association list containing aliases, addr_list and addrtype. For example to obtain a list of numerical address for the host:
    (mapcar inet_addr
    	(cdr (assq 'addr_list
    		   (gethostbyname (gethostname)))))

    (gethostname)

    Returns the configured name of the host.

    (inet_addr str)

    Converts a "x.x.x.x" dotted notation string or a byte array into a number.

    (s-accept stream)

    Returns a new stream based on accepting a connection from the socket assocated with the specified stream which must be in a listen state.

    (s-close stream)

    Closes the socket assocated with the stream.

    (s-drain stream)

    Reads all input from the socket associated with the stream.

    (s-force-output stream)

    Makes sure all buffered output is sent to the socket associated with the stream.

    (s-getc stream)

    Reads one character from the stream. () on eof.

    (s-gets stream)

    Reads a line from the stream.

    (s-open address port listen-flag)

    When listen-flag is () this procedure establishes a connection to the remote address and port specified. If listen-flag is true it creates a socket bound to the local address and port and initiates. The address may be a host name, a number, or a dot-string notation. The port may be the name of a protocol or a number.

    (s-putc char stream)

    Outputs the char to the socket stream.

    (s-puts string stream)

    Outputs the string to the socket stream.

    (s-read size-or-buffer stream)

    Reads the specified number of characters from the stream, returning a string or () on end of file, or attempts to fill the buffer with characters from the stream, returning the actual number read.

    (s-read-sexp stream)

    Reads a standard symbolic lisp expression from the stream. If the SIOD I/O was better modularized this procedure would not be needed.

    Extension: tar

    This is helpful for the efficient implementation of utilities that need to deal with tar files, avoiding repeated use of the substring and datref functions.

    *tar-header-size*

    Bound to the size of a tar header: 512.

    (checksum-tar-header string)

    Computes the standard tar header checksum algorithm.

    (decode-tar-header string)

    Returns an association list containing entries for all the standard components of a tar header structure. This can be done using substring and string->number but this procedure is faster and more convenient.

    Command interfaces and some scripts provided

    These commands all have unix manual pages, which in the interest of brevity are not duplicated here.
    CommandPurpose
    siodthe interpreter
    csiodcommand linker for siod scripts
    cp-builda file copy command with versioning and audit trail
    ftp-cppassive-mode ftp copy
    ftp-putpassive-mode ftp put with rename
    http-getcommand-line http client
    snapshot-dircreate a snapshot of a directory hierarchy.
    snapshot-comparecompare hierarchy snapshots, with options.

    Some scheme coded library modules

    NamePurpose
    cgi-echo.scm example cgi script, echo the environment
    find-files.scm works like the unix find command, but provides lisp data.
    fork-test.scm example use of fork
    hello.scm an example command using infix syntax
    http-server.scm useful as a socket example
    http-stress.scm http client with stress features
    http.scm more http client examples
    ftp.scm support for file transfer protocol
    maze-support.scm cgi script example, provides a run-maze subroutine
    parser_pratt.scm interface to infix language parser
    pop3.scm A pop3 client
    pratt.scm infix language parser
    selfdoc.scm create a table of built-in procedures
    siod.scm mostly obsolete collection of utility subroutines
    smtp.scm smtp client subroutines
    sql_oracle.scm utilities for oracle database client
    sql_rdb.scm utilities for rdb database client
    sql_sybase.scm utilities for sybase database client
    piechart.scm a CGI script that returns a piechart as a GIF

    Garbage Collection

    When SIOD was first released in April of 1988, it had a stop-and-copy garbage collector which could only run at the toplevel of the read-eval-print loop because it had no certain way of determine which objects on the stack were truly pointers; unless of course the runtime system was modified to make use of information the compiler provided to the debugger, in a way that a real lisp compiler would be required to provide. About that same time Jim O'Dell related to me the debugging heuristic he had used in his port of Franz Lisp to the Macintosh, by scanning the stack for items which appeared to be pointers but which had not been properly found by the mark phase of the gc. I suggested at the time that he might as well get rid of the explicit book-keeping code spread throughout Franz Lisp and stick with the heuristic. But since he had already found all the bugs in the book-keeping in the runtime and compiler output there was little benefit and great risk to doing it.

    Then, by the time SIOD had been out for a year there had been enough complaints about the lack of fully available GC that I was motivated to utilize a stack heuristic, since I had no intention of maintaining any explicit book-keeping code in the source. The published arguments in favor of the conservative approach described by Hans Boehm of Xerox Parc then reduced this design decision to a no-brainer, and his implementation suggested the use of setjmp as a sufficiently portable way for C code to get at the machine register set without introducing assembling language. To SIOD then I added only about 300 bytes of VAX instructions to the size of the runtime system.

    There are two storage management techniques which may be chosen at runtime by specifying the -g argument flag.

    If you get strange errors on a machine architecture not listed then you may be forced to use -g1 until you investigate and contact the author for advise.

    Stop and Copy

    As one can see from the source, garbage collection is really quite an easy thing. The procedure gc_relocate is about 25 lines of code, and scan_newspace is about 15.

    The real tricks in handling garbage collection are (in a copying gc):

    1. keeping track of locations containing objects
    2. parsing the heap (in the space scanning)

    The procedure gc_protect is called once (e.g. at startup) on each global location which will contain a lisp object.

    That leaves the stack. The beleive is that if we had chosen not to use the argument and return-value passing mechanism provided by the C-language implementation, (also known as the "machine stack" and "machine procedure calling mechanism) this lisp would be larger, slower, and rather more difficult to read and understand. Furthermore it would be considerably more painful to *add* functionality in the way of SUBR's to the implementation.

    Aside from writing a very machine and compiler specific assembling language routine for each C-language implementation, embodying assumptions about the placement choices for arguments and local values, etc, we are left with the following limitation:

    YOU CAN GC ONLY AT TOP-LEVEL

    However, this fits in perfectly with the programming style imposed in many user interface implementations including the MIT X-Window Toolkit. In the X Toolkit, a callback or work procedure is not supposed to spend much time implementing the action. Therefore it cannot have allocated much storage, and the callback trampoline mechanism can post a work procedure to call the garbage collector when needed.

    Our simple object format makes parsing the heap rather trivial. In more complex situations one ends up requiring object headers or markers of some kind to keep track of the actual storage lengths of objects and what components of objects are lisp pointers.

    Because of the usefulness of strings, they were added by default into SIOD 2.6. The implementation requires a hook that calls the C library memory free procedure when an object is in oldspace and never got relocated to newspace. Obviously this slows down the stop-and-sweep GC, and removes one of the usual advantages it has over mark-and-sweep.

    Mark and Sweep

    In a mark-and-sweep GC the objects are not relocated. Instead one only has to LOOK at objects which are referenced by the argument frames and local variables of the underlying (in this case C-coded) implementation procedures. If a pointer "LOOKS" like it is a valid lisp object (see the procedure mark_locations_array) then it may be marked, and the objects it points to may be marked, as being in-use storage which is not linked into the freelist in the gc_sweep phase.

    Another advantage of the mark_and_sweep storage management technique is that only one heap is required.

    The main disadvantages are:

    1. start-up cost to initially link freelist. (can be avoided by more general but slower NEWCELL code).
    2. does not COMPACT or LOCALIZE the use of storage. This is poor engineering practice in a virtual memory environment.
    3. the entire heap must be looked at, not just the parts with useful storage.

    In general, mark-and-sweep is slower in that it has to look at more memory locations for a given heap size, however the heap size can be smaller for a given problem being solved. More complex analysis is required when READ-ONLY, STATIC, storage spaces are used (which we do not support, currently). Additionally the most sophisticated stop-and-copy storage management techniques take into account considerations of object usage temporality.

    The technique assumes that all machine registers the GC needs to look at will be saved by a setjmp call into the save_regs_gc_mark data, and that every thing else is on the C runtime stack. Hence we have some assumptions that impact portability.

    Porting

    The most heavily ifdef'd module is slibu.c because it utilizes "unix" runtime functionality. You may want to start out by porting the "sample" main program along with the modules needed to be static linked with it.

    If your system or C runtime needs to poll for the interrupt signal mechanism to work, then define INTERRUPT_CHECK to be something useful.

    The STACK_LIMIT and STACK_CHECK macros may need to be conditionized. They currently assume stack growth downward in virtual address. The subr (%%stack-limit setting non-verbose) may be used to change the limits at runtime.

    The stack and register marking code used in the mark-and-sweep GC is unlikely to work on machines that do not keep the procedure call stack in main memory at all times. It is assumed that setjmp saves all registers into the jmp_buff data structure. If your target machine architecture is radically different, such as using linked procedure call frames of some kind, not organized as a stack, then it would be best if you could find vendor-supported routines for walking these frames, such as would be utilized by a debugger. The mark_locations procedure can then be invoked multiple times with the proper start and end addresses.

    If the stack is not always aligned (in LISP-PTR sense) then the gc_mark_and_sweep procedure will not work properly unless steps are taken to work around the problem.

    Example, assuming a byte addressed 32-bit pointer machine:

    
    stack_start_ptr: [LISP-PTR(4)] 
                     [LISP-PTR(4)]
                     [RANDOM(4)]
                     [RANDOM(2)]
                     [LISP-PTR(4)]
                     [LISP-PTR(4)]
                     [RANDOM(2)]
                     [LISP-PTR(4)]
                     [LISP-PTR(4)]
    stack_end:       [LISP-PTR(4)]
    

    As mark_locations goes from start to end it will get off proper alignment somewhere in the middle, and therefore the stack marking operation will not properly identify some valid lisp pointers.

    Fortunately there is an easy fix to this. A more aggressive use of our mark_locations procedure will suffice.

    For example, say that there might be 2-byte quantities inserted into the stack. Then use two calls to mark_locations, as as in THINK_C on the Macintosh:

    
    mark_locations(((char *)stack_start_ptr) + 0,((char *)&stack_end) + 0);
    mark_locations(((char *)stack_start_ptr) + 2,((char *)&stack_end) + 2);
    

    If we think there might be 1-byte quantities, then 4 calls are required:

    
    mark_locations(((char *)stack_start_ptr) + 0,((char *)&stack_end) + 0);
    mark_locations(((char *)stack_start_ptr) + 1,((char *)&stack_end) + 1);
    mark_locations(((char *)stack_start_ptr) + 2,((char *)&stack_end) + 2);
    mark_locations(((char *)stack_start_ptr) + 3,((char *)&stack_end) + 3);
    

    Porting to tiny machines

    SIOD has been run on 16 bit machines, such as MS-DOS tiny memory model and in palm tops.

    Some things to consider:

    On larger machines the structure member alignment requirements cause an array of struct obj to waste space. For example, on an ALPHA with 64 bit pointers each cell takes up 24 bytes, even though ideally only 18 would be required. Is it worth it to pay a price of having a 33% overhead in order to buy simplicity? The most common way of dealing with the alignment problem is to store the object type and gc_mark some place else. Usually this means making assumptions about memory addressing because it is important to be able to get the TYPE as quickly as possible.

    There are certainly other was to organize data, avoiding the use of the C programming struct support, and utilizing carefully contrived macro definitions instead.

    Writing extensions in the C programming language

    The file siod.h provides public declarations of C functions that can be called by extensions. Although some things are missing, having been put into the siodp.h file accidently, and some may have been left out altogether. Any function available to be called from Scheme interpreted programs can also be called from C programs, using the same arguments and with the same return value. Hence the claim for a small cognitive-footprint imposed on extension authors.

    There are three common reasons for wanting to write an extension to the system using the C programming language:

    1. For runtime efficiency.
    2. To take advantage of operating system, or other runtime library provided functionality.
    3. To play games with evaluator semantics.

    Some examples of the first class are the functions memq, and nth, study them. These extensions are straightforward, and easy to debug from the C language debugger, with the functions err0, pr, and prp being provided to call back into the lisp runtime system from the C debugger.

    Some examples from the second class are the ndbm and regex modules, and the support for commercial database client interfaces. In many cases it is convenient to define new scheme data types to encapsulate the complex state of an API. Study how to utilize allocate_user_tc, set_gc_hooks and set_print_hooks. Careful ordering of storage allocation and interrupt management are important. Also don't forget that most C programming API functions do not handle being longjump'd through very well, so beware of how you handle callbacks and SIGINT.

    Functions such as get_c_string, get_c_string_dim, get_c_long, get_c_double, and get_c_file are usually all you need to get at the data you require to get the job done. But beware of spoofing the garbage collector. For example, never do something equivalent to this:

    {LISP x;
     char *z;
     x = strcons(100,NULL);
     z = get_c_string(z);
     /* no further references to x, but z is used */
     }

    Because there are no further references to x, the C compiler might very well reuse the location on the stack in which x resided. If there is any other consing then the garbage collector will go off at some point in the future inside this function, and it will free the memory pointed to by z. A potential example of this sort of thing is the built-in procedure lexec. In theory a C compiler might store envp and gcsafe in the same memory location. But of course for other reasons it is impossible for that to cause problems unless get_c_string was extended to invoke the evaluator in some cases.

    If you want to play with evaluator semantics you need to study the leval function and perhaps the lapply function too. The tc_fsubr object is the conventional way to extend an evaluator, but the tc_msubr is more powerfull and allows for a modular tail recursion. The set_eval_hooks function allows for arbitrary evalution semantics when the first element of a form evaluates to a new datatype.

    User Type Extension

    If you use them then you must provide some information to the garbage collector. To do this you can supply 4 functions,
    1. a user_relocate, takes an object and returns a new copy.
    2. a user_scan, takes an object and calls relocate on its subparts.
    3. a user_mark, takes an object and calls gc_mark on its subparts or it may return one of these to avoid stack growth.
    4. a user_free, takes an object to hack before it gets onto the freelist.
    
    set_gc_hooks(type,
                 user_relocate_fcn,
                 user_scan_fcn,
                 user_mark_fcn,
                 user_free_fcn,
                 &kind_of_gc);
    

    The variable kind_of_gc should be a long. It will receive 0 for mark-and-sweep, 1 for stop-and-copy. Therefore set_gc_hooks should be called AFTER process_cla. You must specify a relocate function with stop-and-copy. The scan function may be NULL if your user types will not have lisp objects in them. Under mark-and-sweep the mark function is required but the free function may be NULL.

    You might also want to extend the printer. This is optional.

    
    set_print_hooks(type,fcn);
    
    The fcn receives the object which should be printed to its second argument, a struct gen_printio *, using procedures such as gput_st.

    LIBSIOD use as an extension language for C programs

    See the modules sample.c and siod.c for example main program usage. Once the storage system is initialized the most simple and useful interface into the interpreter is probably the repl_c_string function.

    References

  • Structure and Interpretation of Computer Programs, by Abelson and Sussman, MIT Press.
  • scheme repository at indiana.edu.
  • scheme repository at cmu.edu.