KnownSpace Symphony

architecture sketch

the application layer is again isolated from the kernel, as in Hydrogen, being served proxy entities from the EntityPool, and being served events, possibly with a ttl value, from the EventPool. a database lives behind the EntityPool, and possibly even the EventPool. all entities are by default persisted. events are not proxied but may be persisted, depending on their ttl. any pool can register() with any other pool of its type---including remote ones---and the consequence is that it becomes a server to the pool it registers with for the kinds of things it serves. this creates chains of servers, with the default EntityPool at the top of the entity pool hierarchy and the default EventPool at the top of the event pool hierarchy. thus, for any one session there could be a collection of entity pools, each one representing local or remote sources of data: a mail server, a news server, an ftp server, a website full of mp3s, a remote database, another running knownspace session's entity pool, or whatever. when two sessions register with each other the default pool(s) of the first remain the roots of their separate hierachies of pools with knowledge of how to fetch things from each of the remote servers. application simpletons are completely unaware of all of this proxying, networking, and chained serving, since they only see the local default EntityPool and the local default EventPool.

to make possible a multi-user system we need a collection of entities to manage personal information, a person entity. each user would have their own entity that stores information about them like their phone numbers, picture, addresses, email addresses, personal websites, knownspace signon id, names, nicknames, userids, static ip numbers, sleeping habits, schedule, spouse's name, children's names, friends, personal likes and dislikes, bad habits, employers, and workplace address. The person object could also know which emails that person sent the user, which webpages that person recommended to the user, which emails the user sent that person, which webpages the user recommended to that person, what that person's rating of a particular website is, what movies that person likes, and what other people are associated with that person. each person entity could also have a link to that person's own (remote) session when the person happens to be online in a symphony session. when we have persons inside the system the user can browse them, search them, organize and reorganize them, and navigate through them just like any other entity. for further expansion of this idea as sketched in the old knownspace, look here: http://www.cs.indiana.edu/~rawlins/website/architecture/entities.html

note that there would have to be yet another simpleton---a user registration simpleton to manage the task of keeping all person entities up to date---person entity creation, editing, and so on. and yet another simpleton that simply notes when any known users registers, to then alter the 'last signon' information in the relevant person entity. and yet another simpleton to check all such person entities whenever new registration events enter the system and so display for its user knowledge of someone of interest who just signed on. there could also be yet another simpleton that lets its user browse all person entities, and so on. if there were a simpleton that could convert any entity to an xml intermediate form---something we'll need anyway---and yet another simpleton to attach xslt information to that, then we could have the browsing simpleton work for any arbitrary entity, not just person entities, something that would be widely useful. same for an 'edit entity' simpleton. by keeping the information in separate related chunks, and relating them only with links to other entities, and using only events to communicate between simpletons, and throwing all entities into one entity pool, and all events into one event pool, and using constraints to find them again, it's easy to add new functionality to symphony as the need for it occurs. each new simpleton is small and easy to think about and so easy to build. and anyone anywhere can build it then share it with all. compare that with the usual horrible alternative of continually having to break down and rewrite one single huge, monlithic communication program. and of course, in normal software engineering, you'd have to pre-think of all the things you could possibly want to do with a person entity before you ever created one single person entity to do something as simple as a phone book! in symphony you could create lots of simple person entities and do simple things with them, then when you want to create more sophisticated data structures for persons, you just create more entities and link them to the originals! or you could write a simpleton that takes the old entities and creates new entities with the new information already linked in, then it deletes the old entities. of course, the cost of all this flexibility in computation and structure is increased cycles and memory and hard disk space. fortunately, with the ever escalating increase in price-performance of computer hardware, even hardcore geeks will eventually realize that their time is much more valuable than the computer's time. so in the long run this is the direction that all software engineering has to go.

so how might the communication channel between two symphony users work? suppose that bryan wants to send gordon a new simpleton: he calls for a communication simpleton inside cerulean, and tells it who to try to talk to, and what to send. to find the right 'routing' information, the communication simpleton generates events that are noticed by the people simpleton, a simpleton that knows about person entities. the people simpleton running on bryan's machine then presents the choices of user targets for the requested communication to its user, bryan, who chooses one to designate who's the target for the present communication, gordon. it then relays the details of how to address online communications (that is, communications going directly to the user's remote entity pool) to the communication simpleton by generating some more events, and the communication simpleton then packages up the information to send in a transportable form as an entity and addresses it to the right registered user given the information supplied by the people simpleton, then drops the whole package into the entity pool. the entity pool then figures out which local copy of which remote entity pool it needs to go to based on the user's name registration information included in the user's person entity and sends it there. the local copy of the remote user's default entity pool then passes on anything sent to it to the true remote pool over the network.

the remote pool on gordon's machine receives the message. meanwhile, another instance of the same communication simpleton that bryan used is running in gordon's session. it has registered itself with its local copy of its event pool and so notices the event of something arriving in its default entity pool. it fetches the message from its entity pool, parses it, perhaps using yet other simpletons specialized to parse messages wrapped in entities, and alerts gordon to its presence. gordon then interacts with his communication simpleton to browse the message or compile the included simpleton. if he decides to compile it, the communication simpleton calls another simpleton---one that knows about the compiler in tools.jar in the jdk. it compiles the code presented to it, instantiates it, and inserts it in gordon's running session.

similarly, when bryan clicks on controls in the simpleton instance running in his session, the simpleton generates events mirroring his clicks. it throws those events into bryan's event pool. as before, bryan's event pool looks to see which user's remote event pools are linked to it, if any, then mirrors the new event there. each local copy of those remote event pools then serves copies of the new event to the respective remote event pools over the network. each of them receives the event, which simpletons in their application layer can then notice and act on.

eating our own dog food

this sketch is only a first cut at an architecture. there are numerous other problems between this and a full featured symphony (see below). however, we don't need to solve any of the following problems for the very first version of symphony. i'm hoping to avoid having to solve too many hard problems (or indeed, any hard problems :) just to get the first version of symphony going. once we have the first version, later versions that improve the functionality will be much easier to generate. also, use of the system to develop the system ("eating our own dog food") will make useful solutions to the many remaining problems much more obvious. and of course it will suggest the best avenues for the most productive future development of the system.

the central idea is that the very first symphony must give us a much more frictionless way to work together to solve any programming problem---including the principal remaining ones that are part of making symphony work well for developers. and the first version of symphony must not be too far away from the persistent Hydrogen we have now so that we can get there relatively soon.

remaining problems

permissions: talk about viruses! gordon really has to trust bryan! eventually we'll need to build some kind of sandbox for remote simpletons to play nice with the local flora and fauna. ideally simpletons should be like applets; there should be an editable policy file controlling what they can do through the security manager, but this would require object-level permissions and we don't have them yet. in the meantime we can make a start with pool-level and session-level permissions.
authentication: how is gordon to know that it really is bryan who's trying to send him stuff? should we have callbacks? what about penetrating firewalls for the callback?
firewall penetration: what to do if a user is behind a firewall but still wants to communicate with others? his session would have to pull remote data rather than have remote data pushed at it. what if both users are behind firewalls?
networking: how to handle 15 users all on at once who're each wishing to team with different subsets of the other 14? should we have a separate event pool for 'all registered users', as opposed to a distribution mechanism to all users satisfying some constraint (which could just be an enumeration of various single users). ideally, the constraint-based solution sounds better, more flexible, and so more 'knownspacy' :).
offline access: how to handle access to the public parts of an offline user's session db? simplest solution: don't allow it :)
session polling: is it better to have port polling or remote registry polling? is there some other scheme? also, what to do about users presently offline? what about users who don't have static ips?
session registration: the registration protocol might be a problem if there are 50 users, 'cause then each session would have to have 100 new (proxy) pools each if all those users want to talk to all of the others. on the other hand, it'll be some time before we have to worry about having the luxury of 50 hardcore developers :). we might want to think about having something a little more demand-oriented eventually, maybe along the lines of freenet where there is flow based only on demand, or perhaps something a bit more centralized---with one central server to handle distribution to connected users, although i dislike that idea because of its fragility---it's very dependent on one central point.
versioning: cvs gives us versioning for free, which we abandon if we abandon cvs for dynamic development. maybe they can checkin their session to cvs on end of session? basically, an entire running session is storable since it persists all the state needed to produce a new running knownspace.
session cvs checkin: it would be nice to not have the entire session be checked into cvs but only the "public" parts. of course it would be silly to do this by hand, but with pool-level permissions we could at least check in an entire pool's worth or not. sounds like a job for a new simpleton...
connection stability: what should we use as the network infrastructure to ensure no data loss on network outage or error? tcp/ip seems like the natural choice, but do we need another layer on top of that that's more knownspace-aware?
simpleton versioning: what if gordon modifies bryan's original simpleton then sends that modified version to bryan? the simpleton will have the same class name as the simpleton instance running inside bryan's session. what happens when bryan tries to integrate it into his session? the jvm would barf. perhaps we should automatically change the simpleton's class name by appending the name of the user who last modified it along with the date? also, if we use such a protocol, how is bryan then to know which instance is which when frontends for both instances are being displayed inside his session?
(an idea!): why not also have a default SIMPLETON pool to act as a server of simpletons, local and remote, just as the default EventPool acts as server of events, local and remote, and the default EntityPool acts as server of entities, local and remote???? again, each such pool would have its own permissions and would be registerable with other pools of its type to transparently serve remote simpletons. we could also have pools for constraints...this can easily go too far, of course. ideally, if we need three or more pools, then we should be able to do the same job with one pool. the problem is that such a pool would have to accept multiple subtypes of some uniform type, and it's not clear what is the supertype of simpleton, event, and entity is, short of Object that is... also, note that a simpleton pool would suggest, by analogy with entity and event pool, the creation of simpleton constraints, which would be nice because it would be a big step toward simpleton scheduling. of course we can create them without the need to have a separate simpleton pool...

kernel problems

(jack's idea!): too much of the architecture is being shunted off into the kernel. if we keep on this way the kernel will bloat, disallowing use of knownspace on small or slow client machines. ideally, much of the new functionality should be shunted off to the application layer where users can pick and choose what they want to include in any particular session. jack's idea of having a layer around the kernel proper is very attractive. it would give us a place to put cerulean's kernelproxy, for example. we need simpletons that are intermediate in power or knowledge between tiny-brained, narrow-minded application simpletons and the full power of the kernel. people should be able to develop sensitive things (example, behind-the-scenes remote pool machinery, or a kernelproxy for user interfaces, or collector simpletons aware of the outside world and the remote pool hierarchy) without also having to know exactly how the true kernel---let's call it the 'microkernel'---all works).
it would be nice to have entity-level and thread-level permissions so that even within the application layer in one session on one machine a user can specify which simpletons can do what things to which entities so that imported simpletons have no chance to damage anything. this problem will likely be around for some time.
how to constrain event propagation across sessions to avoid consuming resources needed for other things? this applies not just to computation or memory, but also to bandwidth. should each event pool have incoming and outgoing constraints? that is, it tells "upstream" event pools which events it wants to listen to and only feeds "downstream" pools those events it's asked to get? whoever is doing this will be using cycles to check the constraints against the events. how to balance that cost against the bandwidth costs of doing no constraint checking and just shipping all the events?
Hydrogen's constraint unsubscription sucks: you have to unsubscribe from all subscribed constraints to unsubscribe from any one of them. presumably we can fix this with the move to active constraints (see below).
(matt's idea!): allow constraints to both serve and search the elements of the things they constrain. this is like making constraints into iterators as well as static descriptors of predicates. constraints will now be active constraints and will be a new class of actor inside knownspace. this takes away the need to have an explicit search() on either kind of pool. can this idea also help solve the other problems with constraints?
Hydrogen's constraints lack regular expression matching, so not all reasonable predicates are expressible.
currently there is still a big difference between simpletons and entities. active entities (or alek's idea of active entity values) seem like the natural class for them to both become...
we still don't have a simpleton scheduler other than the barest minimum.
the user has no access to the guts of the system as its running (to, for example, promote mail handling over engine activity for the moment, or to give precedence to that new simpleton that bryan just shipped in, or even just to watch what's its doing and how its consuming resources).
we have no ability to do a hot restart after a crash or power loss during a running session; that will require much more sophisticated journaling than merely recording which simpletons were running and what the most recent events were.

application problems

how to handle multiple non-textual entityvalues: mp3s, jpgs, movs, whatever?
it would be nice to have a knownspace editor so that knownspace development can go on directly inside knownspace itself. but matt pointed out that this is a truly stupid idea. we shouldn't be reinventing well-honed ancient wheels. the real question is: how to get other applications to work with knownspace???
it would be nice to have a really flexible email handler. this isn't a 'reinventing the wheel' situation because, as far as i'm concerned, this particular wheel has yet to be invented. all mail handlers possess major suckage.
browsers: still need a good 'entity browser'. right now there's no visual tool to examine all the attributes of an entity, outside of cerulean's ability to create entities with arbitrary attributes. we also need browsers for constraints, and events.
it would be nice to be able to encrypt and decrypt entity values to add to knownspace security.
agents: theoretically it should be possible to take the human out of the simpleton distribution loop and have self-roaming simpletons moving across any registered user's running sessions...
(an idea!): with access to the code of each simpleton we now can build self-modifying simpletons! thus we can build in a genetic programming environment inside knownspace! yowsa! of course, that's a long way away...

large, long-term problems

server farms: it would be nice to have the interface/engine/any factorable piece of the application layer, be fully client-server so that we could have one heavy-iron machine with big network pipes to do the heavy lifting and just connect to it remotely to view and manipulate the resulting data. of course, this would make us vulnerable to outage if the server goes down, or the network connection to it breaks. ideally, we should have backup communication paths and protocols to continue to communicate with each other even without a central server.
distributed computation: how can we automatically distribute intensive computations over multiple processors in some form of server farm? load-balancing remote computations. owwww owww owww. my head hurts.
development support: as more people develop more simpletons we'll need a simpleton browser to see what's already available. we'll also need an event browser to see what kinds of events are generatable and by whom to let developers create new simpletons based on the events generated by old simpletons and so build more complex applications while still only allowing totally independent simpleton developers---that is, if we start building something that requires that two developers be in the same room to coordinate what they're each doing then we've built the wrong thing. same for constraints. basically, we'll have to build a development environment in pieces as the pieces we'll need become more clear that can scale beyond a few developers to hundreds.
scripting: still no way to script new simpletons so that naive users can roll their own desktop environment without knowledge of java. the ideal is to have something no more complicated than html but that can describe events and actions and components, not just appearance and data. this will likely be a long-standing problem.