FILE "Finally.txt" IMPLEMENTS Discussion of object finalization strategies & mechanics AUTHOR Ken Dickey DATE I forget LAST UPDATE 1993 August 7 Finally! I had occasion to do some mixed language software and finding manual memory management tedious and error prone, implemented a simple finalization service. As you probably know, object finalization is a way of automatically cleaning up after dumb storage allocators, closing i/o ports, and in general freeing resources when specific data objects are no longer used. The purpose of this article is to share the strategy I used, contrast another finalization strategy, and discuss problems in the hopes that finalization services become more widespread. As I very much enjoy Scheme, the examples will be in that language. However, the techniques--and problems--discussed here are generally applicable to dynamic languages. The interface I use is (register-for-finalization ) After the has been reclaimed by the collector, the is invoked. The thunk is a procedure of no arguments and contains arbitrary code. Note that the thunk cannot directly refer to the object or the object will never be collected! If some state needs to be retained for clean up, one can arrange that both the object and the thunk share the state indirectly. The typical argument against this interface is that it forces an ÒextraÓ indirection to get to shared state. A procedure should be used instead of the thunk and that procedure should be passed the object to be finalized as an argument. The reason I chose the above interface instead is that I could not assure myself of a safe implementation when there are several objects to be finalized which contain cross references to each other. The question then becomes ÒWhen should an object be finalized?Ó. The problem with giving an object which is a candidate for finalization to a procedure is that procedure could resurrect the object by binding it to a global variable. The object is then no longer a candidate for finalization! If that object refers to other objects which were candidates for finalization, they are no longer candidates! But what if one or more of them were already finalized? Rather than dealing with problems of Òpremature burialÓ, the safe strategy is to go ahead and collect objects and run the finalizations after objects really are dead. Implementation The implementation is straight forward. When an object is registered, it and the thunk are added to a hidden list as a Òweak-pairÓ. Now a weak-pair is like a regular pair (cons), except that the first slot is weak--meaning that it does not hold on to an object during a collection if there is no strong link to it. If the object in the first (car) slot is collected, the value of that object is set to false (#f in Scheme). The second (cdr) slot of the weak-pair is a typical, strong slot. After each collection, the list is scanned and when the first slot is #f, the thunk in the second slot is invoked and the weak-pair is freed to be reclaimed on the next collection cycle. Reality is of course somewhat more complex than this. There is the possibility that a thunk may allocate storage and cause another collection. To turn this into a non-problem, a queue is used. When a thunk is to be invoked, it is added instead to this queue. Then each thunk is pulled off the queue and invoked as a separate pass. If a collection occurs, the new thunks are also added to the queue and the collection finishes, returning to the original loop which is picking thunks out of the queue. A simple semaphore is used so that nested collections donÕt cause a problem--there is only one loop invoking thunks (but see Problems, below). Another possible implementation of finalization is to register the objects in a special list and scan the list after GC, checking the appropriate car slots to see if a forward has occurred. The advantages to the strategy I chose are that [A] it generalizes easily to ephemeral/generational collectors as the weak chain can be traversed on a generation by generation basis, whereas the alternative would always force the entire finalization list to be scanned, [B] weak pairs have other uses, such as keeping track of all objects which share some property without having to retain the objects just because they have a property in common (e.g. to implement weak tables or populations), and [C] the service can be written as user code. Weak pairs So if you have a language with weak-pairs and a Òpost-gc-hookÓ, you can do object finalization (see code following). The next question is ÒWhat if I donÕt have these weak-pair things?Ó. If you donÕt have access to your language implementation, go beat up the implementors (er, form a user group and make a request). Otherwise, you can use some variant of the following strategy. Jim Miller gave a very clear description of how to add weak-pairs to a stop-and-copy (semi-space) collector [1] which uses no extra space and time proportional to the number of weak pairs in the system. Adaption to an ephemeral or mark-sweep collector is straight forward. This is a paraphrase of his description. I assume the reader is familiar with typical storage reclamation techniques [2]. When storage is reclaimed (a garbage collection occurs) weak pairs are treated specially. The pair is copied to newSpace, but the tag of the car is set to indicate a fixnum. This means that the when it is scanned, the collector sees a fixnum and the object pointed to by the car is not forwarded. The real story, however, takes place in oldSpace. There is a pointer cell which is only used during gc. After a weak pair has been copied, it is kept in a chain. The pointer now points to the pair and the cdr slot of the pair points to what the pointer was pointing to previously (the car holds the forwarding pointer and broken heart tag). The cdr retains the tag of the car slot. [see figure] >Before GC: -------------------------------------------VV --------------------- | weak-datum | tag1 | (car) | datum2 | tag2 | (cdr) --------------------- >After Scan: ------------------------------------------VV **OldSpace** | **NewSpace** | WeakChain-head | | | V | --------------------- | --------------------- | forward | BrHt |-----> | weak-datum | 0 | (0 is fixnum tag) | next-weak | tag1 | | | datum2 | tag2 | --------------------- | --------------------- | | V | ... After the typical collection is finished, the oldSpace chain of weak pairs is traversed and the forwarding pointer in the oldSpace pair is followed to the newSpace object and the newSpace car used to check the original oldSpace carÕs value. If the car points to a forwarded object, then the slot lives and the forwarded location is stuffed in the car along with the original tag, otherwise the object is dead and the car slot is set to contain the value false. Other Weak Strategies Weak pairs can be implemented in a Mark Sweep collector by using an extra slot which is only used the chain the weaks together during a collection. After the collection, the chain is scanned to see whether the ÒpairÓ is on the free list or not and the appropriate action taken. One can alternately allocate weak pairs in a special area and scan them separately. Design Issues Barry Hayes [3] gives four design issues for finalization: decoupling, promptness, locking, and cycles. Decoupling has to do with separating the collection from invocation of finalization routines. If the collector and finalization are not decoupled, then triggering a collection in a finalization routine can lead to system failure. The strategy presented here deals with this via the finalization queue. There are some runtime assumptions which must be satisfied, however (See Problems, below). Promptness has to do with how rapidly it is determined that an object is garbage and the associated finalizer is invoked. Aside from cycle detection, reference counting collectors do well here and ephemeral collectors do poorly. The strategy presented above does nothing special in this regard. Locking has to do with potential parallelism between collection and finalization. The mechanism described here has not been implemented on a multiprocessor and would minimally need to protect the queue from multiple access. Cycles have to do with detection of cross references between objects to be finalized. The strategy here is to be conservative in that no finalization takes place until the entire cycle is dead--i.e. to let the collector do the detection work here. Dybvig, Bruggeman, and Eby [4] also give four design issues, three of which differ from Hayes. When does finalization occur? If finalization is triggered by the collector, then the finalization routine may have to queue the cleanup event so it happens at a non-critical time rather than when the collector just happens to run. This is a slightly different statement of decoupling. In the strategy above, one would keep a separate queue which the finalization code adds entries to and which gets checked at the appropriate time. In what order are objects finalized? Shared or otherwise related structures may have deallocation order which must be maintained. This is a hard problem in the general case as it depends on the design of the side effects of unrelated systems and will probably grow as finalization services become more widely used. Are the full range of services available to finalization routines? Can a collection occur? What about errors and non-local exits? (See Problems). Is the object being finalized available to the finalization routine? There are several cases where this is very helpful. For example, if symbols do not have property slots then properties may be implemented using a hash table with properties associated with the symbol (or other) key. When the key is collected it would be nice to remove the association. The table could be swept after each collection but it is more efficient to have the key around and just remove the dead associations. Problems As you have guessed by now, there are several cracks in the simple strategy I outlined above. For one thing, one has to be very careful not to allocate storage and thus trigger another collection while one is collecting! The code in the appendix was written so that this is true on the system I typically use, but there are several runtime assumptions which do not hold in general. The most obvious is that in systems which use generational collectors, the SET-CAR! operation allocates a reference when storing a pointer from an older to a newer object across a generation boundary. Non-local exits pose a particular problem, especially when one can use call/cc to reenter a dynamic context. It is easy not to use explicit non-local exits, but somewhat harder to guarantee that no error condition is signaled. If a finalization routine does not return to the finalization processing loop, the queue semaphore is never cleared and no more finalization takes place. Many systems provide dynamic-wind or unwind-protect features, but writing the code to do the right thing in all cases is usually non-trivial. The basic cycle assumed here is that [1] a collection occurs [2] the finalization loop is run which upon continuation resumes the original code. But as a finalization routine is arbitrary code, what if it does a large computation and in essence never returns? It would be good if the new thread of execution was concurrent with the original thread. One can arrange to ping-pong with the original thread at each gc, but this still begs the question of data structure consistency, interaction with thread implementations, futures, et cetera. Not returning is a programmer error as serious as an infinite loop. Another Strategy A strategy which deals with a number of the problems above is to just enqueue all finalization and let the user code do the cleanup explicitly. For example, LucidÕs Common Lisp [5] and Chez Scheme [4] take this strategy. In more detail, objects are registered to be added to a queue when they are no longer referenced. They may be multiply registered and placed on several queues. Note that they are not collected but resurrected onto the queues. I.e. the objects themselves are enqueued and user code has to periodically check for and finalize them. This has the advantage that the objects are themselves available and that non-local exits are no more of a problem than for other user code. Problems, revisited Unfortunately with this alternate strategy, the premature burial of linked structures again becomes a problem. The implementation of this strategy is somewhat tricky in that multiple scans of objects ready to be enqueued may be required to determine which are really ready, as for example a queue of queued objects may be resurrected (see [4])! In conclusion Not all cleanup problems are solved by object finalization. In particular there may be constraints on the order in which certain resources must be freed. On the other hand, this is usually (!) easy to accomplish with procedures and simple resource flags. In addition, a foreign function interface may be required or the underlying system might need to know the data representation of certain objects--e.g. unscanned vectors or strings--to be able to cache resource descriptors for later cleanup. Finalization, however, does go a long way toward freeing people who have seen the light from the low level tedium which caused them to choose a dynamic language in the first place. -------------------------------- [1] Jim Miller: ÒMultiScheme: a Parallel Processing System based on MIT SchemeÓ, MIT PhD thesis, 1987. [2] e.g. see Paul Wilson: ÒUniprocessor Garbage Collection TechniquesÓ, Springer LNCS 637, Memory Management, 1992 for an overview and excellent bibliography. [3] Barry Hayes: ÒFinalization in the Collector InterfaceÓ, Springer LNCS 637, Memory Management, 1992. [4] R. Kent Dybvig, Carl Bruggeman, David Eby: ÒGuardians in a Generation-Based Garbage CollectorÓ, ACM SIGPLAN Ô93 Conference on Programming Language Design and Implementation (PLDI), ACM SIGPLAN NOTICES, vol 28, num 6, June 1993. [5] This was added to the documentation for the Lucid 4.1 release but apparently escaped being added to the index. ;------------------------------------------------------------------ ; Appendix: Scheme code for register-for-finaliztion ;------------------------------------------------------------------ ; FILE "finalize.scm" ; IMPLEMENTS Finalization service. ; AUTHOR Ken Dickey ; DATE 1991 March 21 ; LAST UPDATED 1993 March 5 ; PURPOSE: Clean up windows, dumb malloc'ed storage, etc. ; USAGE: (register-for-finalization ) ; SEMANTICS: After each storage reclamation cycle (a.k.a. GC), ; if the has been collected, the is invoked. ; IMPLEMENTATION: Uses weak-pairs. When the in the car of ; the weak-cons goes to #f, the in the cdr is executed. ; [This code is somewhat harder to follow because it takes some ; care not to cons. Assignment is useful after all! 8^]. ; REQUIRES: weak-pairs and a post-gc-hook (or internals access). ; NOTA BENE: Not for concurrent implementations (queue not protected). ; OTHER ASSUMPTIONS: ;*> (let loop ...) does not cons or is rewritten to use ; a statically allocated variable. ;*> There is no possibility of a collection being triggered ; between the test and the set of the q-semaphore. ;*> The finalization routine called does not take a non-local ; exit (see Problem discussion, above). ;*> In general, operations used do not allocate storage. ; Note that SET-CAR! typically allocates storage in ; generational collectors. (define REGISTER-FOR-FINALIZATION (let ( (CHAIN (cons #f '() )) ) ;; Registration list. ;; FINALIZATION QUEUE. (define Q (cons #f #f)) ;; (cons q-start q-end) ; syntactic sugar tastes sweeter (define Q-SEMAPHORE #f) ;; lightweight semaphore for nested GC (define Q-TEMP #f) ;; temp so don't cons/let in q-pop (define Q-START car) (define Q-END cdr) (define SET-Q-START! set-car!) (define SET-Q-END! set-cdr!) (define (Q-EMPTY?) (not (q-start q))) (define (Q-PUSH! thing) ;; Invariant: thing is a cons (set-cdr! thing #f) (cond ((q-empty?) (set-q-start! q thing) (set-q-end! q thing) ) (else (set-cdr! (q-end q) thing) ;; put at end (cdr) of ;; ..previous cons (set-q-end! q thing) ;; and remember it as the new end ) ) ) (define (Q-POP!) (if (q-empty?) #f (begin (set! q-temp (q-start q)) (set-q-start! q (cdr q-temp)) ; next (set-cdr! q-temp #f) ; don't be a hanger on (if (q-empty?) (set-q-end! q #f)) ; queue now empty again (car q-temp)) ;; return the weak pointed to by the cons ) ) (let ( (POINT (cons #f #f)) ) ;; used in post-gc-deamon, below. ( add-post-gc-deamon ;; Gambit: ##add-gc-finalize-job (lambda () ;; SCAN WEAKS LOOKING FOR "DEAD" ONES ;; (car point) points to previous ;; (cdr point) points to current (set-car! point chain) (set-cdr! point (cdr chain)) (let loop () (cond ((null? (cdr point)) #f) ;; done ((not (weak-car (cadr point))) ;; (false? (car weak-pair)) (set-cdr! (car point) (cddr point)) ;; splice it out (q-push! (cdr point)) ;; and enqueue it ;; next! (set-cdr! point (cdar point)) (loop) ) (else ;; skip it ;; next! (set-car! point (cdr point)) (set-cdr! point (cddr point)) (loop))) ) ;; FINALIZE 'EM (cond ((not q-semaphore) ; guard against nested gc (set! q-semaphore #t) ; (assumes test & set atomic wrt gc) (let q-loop ( (weak (q-pop!)) ) (if weak (begin ( (weak-cdr weak) ) ;; Finalize! -- must return here (q-loop (q-pop!))) ) ) (set! q-semaphore #f) ) ) (set! q-temp #f) ;; free whatever frob for the collector ) ) ) ; value of register-for-finalization (lambda ( ) (set-cdr! chain (cons (weak-cons ) (cdr chain))) 'registered) ) ) ;; --- E O F ---