|
architecture sketch
the application layer is again isolated from the kernel, as in
Hydrogen, being served proxy entities from the EntityPool, and being
served events, possibly with a ttl value, from the EventPool. a
database lives behind the EntityPool, and possibly even the
EventPool. all entities are by default persisted. events are not
proxied but may be persisted, depending on their ttl. any pool can
register() with any other pool of its type---including remote
ones---and the consequence is that it becomes a server to the pool it
registers with for the kinds of things it serves. this creates chains
of servers, with the default EntityPool at the top of the entity pool
hierarchy and the default EventPool at the top of the event pool
hierarchy. thus, for any one session there could be a collection of
entity pools, each one representing local or remote sources of data: a
mail server, a news server, an ftp server, a website full of mp3s, a
remote database, another running knownspace session's entity pool, or
whatever. when two sessions register with each other the default
pool(s) of the first remain the roots of their separate hierachies of
pools with knowledge of how to fetch things from each of the remote
servers. application simpletons are completely unaware of all of this
proxying, networking, and chained serving, since they only see the
local default EntityPool and the local default EventPool.
to make possible a multi-user system we need a collection of entities
to manage personal information, a person entity. each user would have
their own entity that stores information about them like their phone
numbers, picture, addresses, email addresses, personal websites,
knownspace signon id, names, nicknames, userids, static ip numbers,
sleeping habits, schedule, spouse's name, children's names, friends,
personal likes and dislikes, bad habits, employers, and workplace
address. The person object could also know which emails that person
sent the user, which webpages that person recommended to the user,
which emails the user sent that person, which webpages the user
recommended to that person, what that person's rating of a particular
website is, what movies that person likes, and what other people are
associated with that person. each person entity could also have a link
to that person's own (remote) session when the person happens to be
online in a symphony session. when we have persons inside the system
the user can browse them, search them, organize and reorganize them,
and navigate through them just like any other entity. for further
expansion of this idea as sketched in the old knownspace, look here:
http://www.cs.indiana.edu/~rawlins/website/architecture/entities.html
note that there would have to be yet another simpleton---a user
registration simpleton to manage the task of keeping all person
entities up to date---person entity creation, editing, and so on. and
yet another simpleton that simply notes when any known users registers,
to then alter the 'last signon' information in the relevant person
entity. and yet another simpleton to check all such person entities
whenever new registration events enter the system and so display for
its user knowledge of someone of interest who just signed on. there
could also be yet another simpleton that lets its user browse all
person entities, and so on. if there were a simpleton that could
convert any entity to an xml intermediate form---something we'll need
anyway---and yet another simpleton to attach xslt information to that,
then we could have the browsing simpleton work for any arbitrary
entity, not just person entities, something that would be widely
useful. same for an 'edit entity' simpleton. by keeping the information
in separate related chunks, and relating them only with links to other
entities, and using only events to communicate between simpletons, and
throwing all entities into one entity pool, and all events into one
event pool, and using constraints to find them again, it's easy to
add new functionality to symphony as the need for it occurs. each new
simpleton is small and easy to think about and so easy to build. and
anyone anywhere can build it then share it with all. compare that with
the usual horrible alternative of continually having to break down and
rewrite one single huge, monlithic communication program. and of
course, in normal software engineering, you'd have to pre-think of all
the things you could possibly want to do with a person entity before
you ever created one single person entity to do something as simple as
a phone book! in symphony you could create lots of simple person entities
and do simple things with them, then when you want to create more
sophisticated data structures for persons, you just create more
entities and link them to the originals! or you could write a simpleton
that takes the old entities and creates new entities with the new
information already linked in, then it deletes the old entities. of
course, the cost of all this flexibility in computation and structure
is increased cycles and memory and hard disk space. fortunately, with
the ever escalating increase in price-performance of computer hardware,
even hardcore geeks will eventually realize that their time is much more
valuable than the computer's time. so in the long run this is the
direction that all software engineering has to go.
so how might the communication channel between two symphony users
work? suppose that bryan wants to send gordon a new simpleton: he
calls for a communication simpleton inside cerulean, and tells it who
to try to talk to, and what to send. to find the right 'routing'
information, the communication simpleton generates events that are
noticed by the people simpleton, a simpleton that knows about person
entities. the people simpleton running on bryan's machine then presents
the choices of user targets for the requested communication to its
user, bryan, who chooses one to designate who's the target for the
present communication, gordon. it then relays the details of how to
address online communications (that is, communications going directly
to the user's remote entity pool) to the communication simpleton by
generating some more events, and the communication simpleton then
packages up the information to send in a transportable form as an
entity and addresses it to the right registered user given the
information supplied by the people simpleton, then drops the whole
package into the entity pool. the entity pool then figures out which
local copy of which remote entity pool it needs to go to based on the
user's name registration information included in the user's person
entity and sends it there. the local copy of the remote user's default
entity pool then passes on anything sent to it to the true remote pool
over the network.
the remote pool on gordon's machine receives the message. meanwhile,
another instance of the same communication simpleton that bryan used is
running in gordon's session. it has registered itself with its local
copy of its event pool and so notices the event of something arriving
in its default entity pool. it fetches the message from its entity
pool, parses it, perhaps using yet other simpletons specialized to
parse messages wrapped in entities, and alerts gordon to its presence.
gordon then interacts with his communication simpleton to browse the
message or compile the included simpleton. if he decides to compile it,
the communication simpleton calls another simpleton---one that knows
about the compiler in tools.jar in the jdk. it compiles the code
presented to it, instantiates it, and inserts it in gordon's running
session.
similarly, when bryan clicks on controls in the simpleton instance
running in his session, the simpleton generates events mirroring his
clicks. it throws those events into bryan's event pool. as before,
bryan's event pool looks to see which user's remote event pools are
linked to it, if any, then mirrors the new event there. each local copy
of those remote event pools then serves copies of the new event to the
respective remote event pools over the network. each of them receives
the event, which simpletons in their application layer can then notice
and act on.
eating our own dog food
this sketch is only a first cut at an architecture. there are numerous
other problems between this and a full featured symphony (see below).
however, we don't need to solve any of the following problems for the
very first version of symphony. i'm hoping to avoid having to solve too
many hard problems (or indeed, any hard problems :) just to get the
first version of symphony going. once we have the first version, later
versions that improve the functionality will be much easier to
generate. also, use of the system to develop the system ("eating our
own dog food") will make useful solutions to the many remaining
problems much more obvious. and of course it will suggest the best
avenues for the most productive future development of the system.
the central idea is that the very first symphony must give us a much
more frictionless way to work together to solve any programming
problem---including the principal remaining ones that are part of
making symphony work well for developers. and the first version of
symphony must not be too far away from the persistent Hydrogen we have
now so that we can get there relatively soon.
remaining problems
-
permissions: talk about viruses! gordon really has to trust bryan!
eventually we'll need to build some kind of sandbox for remote
simpletons to play nice with the local flora and fauna. ideally
simpletons should be like applets; there should be an editable policy
file controlling what they can do through the security manager, but
this would require object-level permissions and we don't have them
yet. in the meantime we can make a start with pool-level and
session-level permissions.
-
authentication: how is gordon to know that it really is bryan who's
trying to send him stuff? should we have callbacks? what about
penetrating firewalls for the callback?
-
firewall penetration: what to do if a user is behind a firewall but
still wants to communicate with others? his session would have to pull
remote data rather than have remote data pushed at it. what if both
users are behind firewalls?
-
networking: how to handle 15 users all on at once who're each wishing
to team with different subsets of the other 14? should we have a
separate event pool for 'all registered users', as opposed to a
distribution mechanism to all users satisfying some constraint (which
could just be an enumeration of various single users). ideally, the
constraint-based solution sounds better, more flexible, and so more
'knownspacy' :).
-
offline access: how to handle access to the public parts of an offline
user's session db? simplest solution: don't allow it :)
-
session polling: is it better to have port polling or remote registry
polling? is there some other scheme? also, what to do about users
presently offline? what about users who don't have static ips?
-
session registration: the registration protocol might be a problem if
there are 50 users, 'cause then each session would have to have 100 new
(proxy) pools each if all those users want to talk to all of the
others. on the other hand, it'll be some time before we have to worry
about having the luxury of 50 hardcore developers :). we might want to
think about having something a little more demand-oriented eventually,
maybe along the lines of freenet where there is flow based only on
demand, or perhaps something a bit more centralized---with one central
server to handle distribution to connected users, although i dislike
that idea because of its fragility---it's very dependent on one central
point.
-
versioning: cvs gives us versioning for free, which we abandon if we
abandon cvs for dynamic development. maybe they can checkin their
session to cvs on end of session? basically, an entire running session
is storable since it persists all the state needed to produce a new
running knownspace.
-
session cvs checkin: it would be nice to not have the entire session
be checked into cvs but only the "public" parts. of course it would be
silly to do this by hand, but with pool-level permissions we could at
least check in an entire pool's worth or not. sounds like a job for a
new simpleton...
-
connection stability: what should we use as the network infrastructure
to ensure no data loss on network outage or error? tcp/ip seems like
the natural choice, but do we need another layer on top of that that's
more knownspace-aware?
-
simpleton versioning: what if gordon modifies bryan's original
simpleton then sends that modified version to bryan? the simpleton will
have the same class name as the simpleton instance running inside
bryan's session. what happens when bryan tries to integrate it into his
session? the jvm would barf. perhaps we should automatically change the
simpleton's class name by appending the name of the user who last
modified it along with the date? also, if we use such a protocol, how
is bryan then to know which instance is which when frontends for both
instances are being displayed inside his session?
-
(an idea!): why not also have a default SIMPLETON pool to act as a
server of simpletons, local and remote, just as the default EventPool
acts as server of events, local and remote, and the default EntityPool
acts as server of entities, local and remote???? again, each such pool
would have its own permissions and would be registerable with other
pools of its type to transparently serve remote simpletons. we could
also have pools for constraints...this can easily go too far, of
course. ideally, if we need three or more pools, then we should be able
to do the same job with one pool. the problem is that such a pool would
have to accept multiple subtypes of some uniform type, and it's not
clear what is the supertype of simpleton, event, and entity is, short
of Object that is... also, note that a simpleton pool would suggest, by
analogy with entity and event pool, the creation of simpleton
constraints, which would be nice because it would be a big step toward
simpleton scheduling. of course we can create them without the need to
have a separate simpleton pool...
other problems
kernel problems
-
(jack's idea!): too much of the architecture is being shunted off into
the kernel. if we keep on this way the kernel will bloat, disallowing
use of knownspace on small or slow client machines. ideally, much of
the new functionality should be shunted off to the application layer
where users can pick and choose what they want to include in any
particular session. jack's idea of having a layer around the kernel
proper is very attractive. it would give us a place to put cerulean's
kernelproxy, for example. we need simpletons that are intermediate in
power or knowledge between tiny-brained, narrow-minded application
simpletons and the full power of the kernel. people should be able to
develop sensitive things (example, behind-the-scenes remote pool
machinery, or a kernelproxy for user interfaces, or collector
simpletons aware of the outside world and the remote pool hierarchy)
without also having to know exactly how the true kernel---let's call it
the 'microkernel'---all works).
-
it would be nice to have entity-level and thread-level permissions so
that even within the application layer in one session on one machine a
user can specify which simpletons can do what things to which entities
so that imported simpletons have no chance to damage anything. this
problem will likely be around for some time.
-
how to constrain event propagation across sessions to avoid consuming
resources needed for other things? this applies not just to computation
or memory, but also to bandwidth. should each event pool have incoming
and outgoing constraints? that is, it tells "upstream" event pools
which events it wants to listen to and only feeds "downstream" pools
those events it's asked to get? whoever is doing this will be using
cycles to check the constraints against the events. how to balance that
cost against the bandwidth costs of doing no constraint checking and
just shipping all the events?
-
Hydrogen's constraint unsubscription sucks: you have to unsubscribe
from all subscribed constraints to unsubscribe from any one of them.
presumably we can fix this with the move to active constraints (see
below).
-
(matt's idea!): allow constraints to both serve and search the
elements of the things they constrain. this is like making constraints
into iterators as well as static descriptors of predicates. constraints
will now be active constraints and will be a new class of actor inside
knownspace. this takes away the need to have an explicit search() on
either kind of pool. can this idea also help solve the other problems
with constraints?
-
Hydrogen's constraints lack regular expression matching, so not all
reasonable predicates are expressible.
-
currently there is still a big difference between simpletons and
entities. active entities (or alek's idea of active entity values) seem
like the natural class for them to both become...
-
we still don't have a simpleton scheduler other than the barest
minimum.
-
the user has no access to the guts of the system as its running (to,
for example, promote mail handling over engine activity for the moment,
or to give precedence to that new simpleton that bryan just shipped in,
or even just to watch what's its doing and how its consuming
resources).
-
we have no ability to do a hot restart after a crash or power loss
during a running session; that will require much more sophisticated
journaling than merely recording which simpletons were running and what
the most recent events were.
application problems
-
how to handle multiple non-textual entityvalues: mp3s, jpgs, movs,
whatever?
-
it would be nice to have a knownspace editor so that knownspace
development can go on directly inside knownspace itself. but matt
pointed out that this is a truly stupid idea. we shouldn't be
reinventing well-honed ancient wheels. the real question is: how to get
other applications to work with knownspace???
-
it would be nice to have a really flexible email handler. this isn't a
'reinventing the wheel' situation because, as far as i'm concerned,
this particular wheel has yet to be invented. all mail handlers possess
major suckage.
-
browsers: still need a good 'entity browser'. right now there's no
visual tool to examine all the attributes of an entity, outside of
cerulean's ability to create entities with arbitrary attributes. we
also need browsers for constraints, and events.
-
it would be nice to be able to encrypt and decrypt entity values to
add to knownspace security.
-
agents: theoretically it should be possible to take the human out of
the simpleton distribution loop and have self-roaming simpletons moving
across any registered user's running sessions...
-
(an idea!): with access to the code of each simpleton we now can build
self-modifying simpletons! thus we can build in a genetic programming
environment inside knownspace! yowsa! of course, that's a long way
away...
large, long-term problems
-
server farms: it would be nice to have the interface/engine/any
factorable piece of the application layer, be fully client-server so
that we could have one heavy-iron machine with big network pipes to do
the heavy lifting and just connect to it remotely to view and
manipulate the resulting data. of course, this would make us vulnerable
to outage if the server goes down, or the network connection to it
breaks. ideally, we should have backup communication paths and
protocols to continue to communicate with each other even without a
central server.
-
distributed computation: how can we automatically distribute intensive
computations over multiple processors in some form of server farm?
load-balancing remote computations. owwww owww owww. my head hurts.
-
development support: as more people develop more simpletons we'll need
a simpleton browser to see what's already available. we'll also need an
event browser to see what kinds of events are generatable and by whom
to let developers create new simpletons based on the events generated
by old simpletons and so build more complex applications while still
only allowing totally independent simpleton developers---that is, if we
start building something that requires that two developers be in the
same room to coordinate what they're each doing then we've built the
wrong thing. same for constraints. basically, we'll have to build a
development environment in pieces as the pieces we'll need become more
clear that can scale beyond a few developers to hundreds.
-
scripting: still no way to script new simpletons so that naive users
can roll their own desktop environment without knowledge of java. the
ideal is to have something no more complicated than html but that can
describe events and actions and components, not just appearance and
data. this will likely be a long-standing problem.
|