FILE		"Finally.txt"
IMPLEMENTS	Discussion of object finalization strategies & mechanics
AUTHOR		Ken Dickey
DATE		I forget
LAST UPDATE	1993 August 7

							Finally!

I had occasion to do some mixed language software and finding manual 
memory management tedious and error prone, implemented a simple 
finalization service.  As you probably know, object finalization is a way 
of automatically cleaning up after dumb storage allocators, closing i/o 
ports, and in general freeing resources when specific data objects are no 
longer used.  The purpose of this article is to share the strategy I used, 
contrast another finalization strategy, and discuss problems in the hopes 
that finalization services become more widespread.  As I very much enjoy 
Scheme, the examples will be in that language.   However, the 
techniques--and problems--discussed here are generally applicable to 
dynamic languages.

The interface I use is

	(register-for-finalization <object> <thunk>)

After the <object> has been reclaimed by the collector, the <thunk> is 
invoked.  The thunk is a procedure of no arguments and contains arbitrary 
code.  Note that the thunk cannot directly refer to the object or the 
object will never be collected!  If some state needs to be retained for 
clean up, one can arrange that both the object and the thunk share the 
state indirectly.  

The typical argument against this interface is that it forces an ÒextraÓ 
indirection to get to shared state. A procedure should be used instead of 
the thunk and that procedure should be passed the object to be finalized 
as an argument.  The reason I chose the above interface instead is that I 
could not assure myself of a safe implementation when there are several 
objects to be finalized which contain cross references to each other.  The 
question then becomes ÒWhen should an object be finalized?Ó.

The problem with giving an object which is a candidate for finalization to 
a procedure is that procedure could resurrect the object by binding it to 
a global variable.  The object is then no longer a candidate for 
finalization!  If that object refers to other objects which were 
candidates for finalization, they are no longer candidates!  But what if 
one or more of them were already finalized?  Rather than dealing with 
problems of Òpremature burialÓ, the safe strategy is to go ahead and 
collect objects and run the finalizations after objects really are dead.


Implementation

The implementation is straight forward.  When an object is registered, it 
and the thunk are added to a hidden list as a Òweak-pairÓ.  Now a 
weak-pair is like a regular pair (cons), except that the first slot is 
weak--meaning that it does not hold on to an object during a collection if 
there is no strong link to it.  If the object in the first (car) slot is 
collected, the value of that object is set to false (#f in Scheme).  The 
second (cdr) slot of the weak-pair is a typical, strong slot.  After each 
collection, the list is scanned and when the first slot is #f, the thunk 
in the second slot is invoked and the weak-pair is freed to be reclaimed 
on the next collection cycle.

Reality is of course somewhat more complex than this.  There is the 
possibility that a thunk may allocate storage and cause another 
collection.  To turn this into a non-problem, a queue is used.  When a 
thunk is to be invoked, it is added instead to this queue.  Then each 
thunk is pulled off the queue and invoked as a separate pass.  If a 
collection occurs, the new thunks are also added to the queue and the 
collection finishes, returning to the original loop which is picking 
thunks out of the queue.  A simple semaphore is used so that nested 
collections donÕt cause a problem--there is only one loop invoking thunks 
(but see Problems, below).

Another possible implementation of finalization is to register the objects 
in a special list and scan the list after GC, checking the appropriate car 
slots to see if a forward has occurred.  The advantages to  the strategy I 
chose are that [A] it generalizes easily to ephemeral/generational 
collectors as the weak chain can be traversed on a generation by 
generation basis, whereas the alternative would always force the entire 
finalization list to be scanned,  [B]  weak pairs have other uses, such as 
keeping track of all objects which share some property without having to 
retain the objects just because they have a property in common (e.g. to 
implement weak tables or populations), and [C] the service can be written 
as user code.


Weak pairs

So if you have a language with weak-pairs and a Òpost-gc-hookÓ, you can do 
object finalization  (see code following).  The next question is ÒWhat if 
I donÕt have these weak-pair things?Ó.  If you donÕt have access to your 
language implementation, go beat up the implementors (er, form a user 
group and make a request).  Otherwise, you can use some variant of the 
following strategy.

Jim Miller gave a very clear description of how to add weak-pairs to a 
stop-and-copy (semi-space) collector [1] which uses no extra space and 
time proportional to the number of weak pairs in the system.  Adaption to 
an ephemeral or mark-sweep collector is straight forward.  This is a 
paraphrase of his description.  I assume the reader is familiar with 
typical storage reclamation techniques [2].

When storage is reclaimed  (a garbage collection occurs) weak pairs are 
treated specially.  The pair is copied to newSpace, but the tag of the car 
is set to indicate a fixnum.  This means that the when it is scanned, the 
collector sees a fixnum and the object pointed to by the car is not 
forwarded.  The real story, however, takes place in oldSpace.  There is a 
pointer cell which is only used during gc.  After a weak pair has been 
copied, it is kept in a chain.  The pointer now points to the pair and the 
cdr slot of the pair points to what the pointer was pointing to previously 
(the car holds the forwarding pointer and broken heart tag).  The cdr 
retains the tag of the car slot.  [see figure]

>Before GC: -------------------------------------------VV
	
	---------------------
	| weak-datum | tag1 | (car)
	| datum2     | tag2 | (cdr)
	---------------------
	
>After Scan: ------------------------------------------VV
	
	**OldSpace**            |       **NewSpace**
	                        |
 WeakChain-head             |
     |                      |
     V                      |
	---------------------   |  	---------------------
	| forward    | BrHt |-----> | weak-datum |   0  | (0 is fixnum tag)
	| next-weak  | tag1 |   |   | datum2     | tag2 |
	---------------------   |   ---------------------
	    |                   |
	    V                   |
	   ...


After the typical collection is finished, the oldSpace chain of weak pairs 
is traversed and the forwarding pointer in the oldSpace pair is followed 
to the newSpace object and the newSpace car used to check the original 
oldSpace carÕs value.  If the car points to a forwarded object, then the 
slot lives and the forwarded location is stuffed in the car along with the 
original tag, otherwise the object is dead and the car slot is set to 
contain the value false.


Other Weak Strategies

Weak pairs can be implemented in a Mark Sweep collector by using an extra 
slot which is only used the chain the weaks together during a collection.  
After the collection, the chain is scanned to see whether the ÒpairÓ is on 
the free list or not and the appropriate action taken.

One can alternately allocate weak pairs in a special area and scan them 
separately.


Design Issues

Barry Hayes [3] gives four design issues for finalization: decoupling, 
promptness, locking, and cycles.

Decoupling has to do with separating the collection from invocation of 
finalization routines.  If the collector and finalization are not 
decoupled, then triggering a collection in a finalization routine can lead 
to system failure.  The strategy presented here deals with this via the 
finalization queue.  There are some runtime assumptions which must be 
satisfied, however  (See Problems, below).

Promptness has to do with how rapidly it is determined that an object is 
garbage and the associated finalizer is invoked.  Aside from cycle 
detection, reference counting collectors do well here and ephemeral 
collectors do poorly.   The strategy presented above does nothing special 
in this regard.

Locking has to do with potential parallelism between collection and 
finalization.  The mechanism described here has not been implemented on a 
multiprocessor and would minimally need to protect the queue from multiple 
access.

Cycles have to do with detection of cross references between objects to be 
finalized.  The strategy here is to be conservative in that no 
finalization takes place until the entire cycle is dead--i.e. to let the 
collector do the detection work here.

Dybvig, Bruggeman, and Eby [4] also give four design issues, three of 
which differ from Hayes.

When does finalization occur?  If finalization is triggered by the 
collector, then the finalization routine may have to queue the cleanup 
event so it happens at a non-critical time rather than when the collector 
just happens to run.  This is a slightly different statement of 
decoupling.  In the strategy above, one would keep a separate queue which 
the finalization code adds entries to and which gets checked at the 
appropriate time.

In what order are objects finalized?  Shared or otherwise related 
structures may have deallocation order which must be maintained.  This is 
a hard problem in the general case as it depends on the design of the side 
effects of unrelated systems and will probably grow as finalization 
services become more widely used.

Are the full range of services available to finalization routines?  Can a 
collection occur?  What about errors and non-local exits?  (See Problems).

Is the object being finalized available to the finalization routine?  
There are several cases where this is very helpful.  For example, if 
symbols do not have property slots then properties may be implemented 
using a hash table with properties associated with the symbol (or other) 
key.  When the key is collected it would be nice to remove the 
association. The table could be swept after each collection but it is more 
efficient to have the key around and just remove the dead associations.


Problems

As you have guessed by now, there are several cracks in the simple 
strategy I outlined above. For one thing, one has to be very careful not 
to allocate storage and thus trigger another collection while one is 
collecting!  The code in the appendix was written so that this is true on 
the system I typically use, but there are several runtime assumptions 
which do not hold in general.  The most obvious is that in systems which 
use generational collectors, the SET-CAR! operation allocates a reference 
when storing a pointer from an older to a newer object across a generation 
boundary. 

Non-local exits pose a particular problem, especially when one can use 
call/cc to reenter a dynamic context.  It is easy not to use explicit 
non-local exits, but somewhat harder to guarantee that no error condition 
is signaled.  If a finalization routine does not return to the 
finalization processing loop, the queue semaphore is never cleared and no 
more finalization takes place.  Many systems provide dynamic-wind or 
unwind-protect features, but writing the code to do the right thing in all 
cases is usually non-trivial.

The basic cycle assumed here is that [1] a collection occurs [2] the 
finalization loop is run which upon continuation resumes the original 
code.  But as a finalization routine is arbitrary code, what if it does a 
large computation and in essence never returns?  It would be good if the 
new thread of execution was concurrent with the original thread.  One can 
arrange to ping-pong with the original thread at each gc, but this still 
begs the question of data structure consistency, interaction with thread 
implementations, futures, et cetera.  Not returning is a programmer error 
as serious as an infinite loop.


Another Strategy

A strategy which deals with a number of the problems above is to just 
enqueue all finalization and let the user code do the cleanup explicitly.  
For example, LucidÕs Common Lisp [5] and Chez Scheme [4] take this 
strategy.  In more detail, objects are registered to be added to a queue 
when they are no longer referenced.  They may be multiply registered and 
placed on several queues.  Note that they are not collected but 
resurrected onto the queues. I.e. the objects themselves are enqueued and 
user code has to periodically check for and finalize them.  This has the 
advantage that the objects are themselves available and that non-local 
exits are no more of a problem than for other user code.


Problems, revisited

Unfortunately with this alternate strategy, the premature burial of linked 
structures again becomes a problem.  The implementation of this strategy 
is somewhat tricky in that multiple scans of objects ready to be enqueued 
may be required to determine which are really ready, as for example a 
queue of queued objects may be resurrected (see [4])!  


In conclusion

Not all cleanup problems are solved by object finalization.  In particular 
there may be constraints on the order in which certain resources must be 
freed.  On the other hand, this is usually (!) easy to accomplish with 
procedures and simple resource flags.  In addition, a foreign function 
interface may be required or the underlying system might need to know the 
data representation of certain objects--e.g. unscanned vectors or 
strings--to be able to cache resource descriptors for later cleanup.  
Finalization, however, does go a long way toward freeing people who have 
seen the light from the low level tedium which caused them to choose a 
dynamic language in the first place.


--------------------------------

[1] Jim Miller: ÒMultiScheme: a Parallel Processing System based on MIT 
SchemeÓ, MIT PhD thesis, 1987.

[2] e.g. see Paul Wilson: ÒUniprocessor Garbage Collection TechniquesÓ, 
Springer LNCS 637, Memory Management, 1992 for an overview and excellent 
bibliography.

[3] Barry Hayes: ÒFinalization in the Collector InterfaceÓ, Springer LNCS 
637, Memory Management, 1992.

[4] R. Kent Dybvig, Carl Bruggeman, David Eby: ÒGuardians in a 
Generation-Based Garbage CollectorÓ, ACM SIGPLAN Ô93 Conference on 
Programming Language Design and Implementation (PLDI), ACM SIGPLAN 
NOTICES, vol 28, num 6, June 1993.

[5] This was added to the documentation for the Lucid 4.1 release but 
apparently escaped being added to the index.


;------------------------------------------------------------------
; Appendix: Scheme code for register-for-finaliztion
;------------------------------------------------------------------
; FILE         "finalize.scm"
; IMPLEMENTS    Finalization service.
; AUTHOR        Ken Dickey
; DATE          1991 March 21
; LAST UPDATED  1993 March  5

; PURPOSE:  Clean up windows, dumb malloc'ed storage, etc.

; USAGE:  (register-for-finalization <object> <thunk>)

; SEMANTICS: After each storage reclamation cycle (a.k.a. GC),
; if the <object> has been collected, the <thunk> is invoked.

; IMPLEMENTATION: Uses weak-pairs.  When the <object> in the car of
; the weak-cons goes to #f, the <thunk> in the cdr is executed.
; [This code is somewhat harder to follow because it takes some
; care not to cons.  Assignment is useful after all! 8^].

; REQUIRES: weak-pairs and a post-gc-hook (or internals access).

; NOTA BENE: Not for concurrent implementations (queue not protected).
; OTHER ASSUMPTIONS: 
;*> (let loop ...) does not cons or is rewritten to use 
; a statically allocated variable.
;*> There is no possibility of a collection being triggered 
; between the test and the set of the q-semaphore.
;*> The finalization routine called does not take a non-local
; exit (see Problem discussion, above).
;*> In general, operations used do not allocate storage.
;   Note that SET-CAR! typically allocates storage in 
; generational collectors.

(define  REGISTER-FOR-FINALIZATION

  (let ( (CHAIN (cons #f '() )) ) ;; Registration list. 

    ;; FINALIZATION QUEUE.
    (define Q (cons #f #f))    ;; (cons q-start q-end)
    ; syntactic sugar tastes sweeter
    (define Q-SEMAPHORE  #f)   ;; lightweight semaphore for nested GC
    (define Q-TEMP       #f)   ;; temp so don't cons/let in q-pop 
    (define Q-START car)
    (define Q-END   cdr)
    (define SET-Q-START! set-car!)
    (define SET-Q-END!   set-cdr!)
    (define (Q-EMPTY?) (not (q-start q)))

    (define (Q-PUSH! thing)  ;; Invariant: thing is a cons
      (set-cdr! thing #f)
      (cond
        ((q-empty?)
         (set-q-start! q thing) 
         (set-q-end!   q thing)
         )
         (else
           (set-cdr! (q-end q) thing) ;; put at end (cdr) of 
           							  ;; ..previous cons
           (set-q-end! q thing)       ;; and remember it as the new end
         )
    ) )
    
    (define (Q-POP!)
      (if (q-empty?)
        #f 
        (begin
          (set! q-temp (q-start q)) 
          (set-q-start! q (cdr q-temp)) ; next
          (set-cdr! q-temp #f)          ; don't be a hanger on
          (if (q-empty?)
               (set-q-end! q #f))       ; queue now empty again
          (car q-temp))   ;; return the weak pointed to by the cons
    ) ) 

    (let ( (POINT (cons #f #f)) )  ;; used in post-gc-deamon, below.

      ( add-post-gc-deamon	;; Gambit: ##add-gc-finalize-job  
         (lambda ()
            ;; SCAN WEAKS LOOKING FOR "DEAD" ONES
            ;; (car point) points to previous         
            ;; (cdr point) points to current
            (set-car! point chain)
            (set-cdr! point (cdr chain))
            (let loop ()
              (cond 
               ((null? (cdr point)) #f)   ;; done
               ((not (weak-car (cadr point)))  ;; (false? (car weak-pair))
                (set-cdr! (car point) (cddr point)) ;; splice it out 
                (q-push!  (cdr point))              ;; and enqueue it
                ;; next!
                (set-cdr! point (cdar point))
                (loop) 
                ) 
               (else  ;; skip it
                ;; next! 
                (set-car! point (cdr  point))
                (set-cdr! point (cddr point))
                (loop))) 
              )
            ;; FINALIZE 'EM 
            (cond
              ((not q-semaphore)     ; guard against nested gc
               (set! q-semaphore #t) ; (assumes test & set atomic wrt gc)
               (let q-loop ( (weak (q-pop!)) )
                 (if weak
                   (begin 
                     ( (weak-cdr weak) ) ;; Finalize! -- must return here
                     (q-loop (q-pop!)))
                ) )
                (set! q-semaphore #f)
            ) )
            (set! q-temp #f) ;; free whatever frob for the collector 
    ) ) )
    
    ; value of register-for-finalization
    (lambda (<object> <thunk>)
      (set-cdr! chain (cons (weak-cons <object> <thunk>) (cdr chain)))
      'registered)
) )

;;                       --- E O F ---