Indiana University Bloomington

School of Informatics and Computing

Technical Report TR659:
A Scalable and Robust Coordination Architecture for Distributed Management

Srinath Perera, Dennis Gannon
(Feb 2008), 10 pages
[Under revision for Publication]
While opening avenues for unlimited possibilities, distributed systems have introduced management complexity as an unfavorable trait. Therefore, as distributed systems become commonplace, the automation of system management has become a primary challenge in information technology. The state of art in system management assigns each managed resource to an external entity (manager), which monitors, analyzes and controls the resource and a collection of such managers manages a system. In such settings, each manager has to act with partial knowledge about the system, and to maintain the system as a whole in acceptable state, those managers should be controlled and coordinated. This paper presents a scalable and robust coordination architecture for distributed management. The proposed architecture consists of a cloud of managers placed on a P2P network, and a coordinator, which re-elects on failure. Each resource in the system is assigned to a manager, and managers monitor the system and maintain a distributed data model, which reflects system state (a meta-model). Using the meta-model, each manager enforces a set of user-defined management rules to implement resource level management, and the global coordination is achieved using user-defined, global management rules enforced by the coordinator. Main contributions of the paper are, a coordination architecture for distributed management which supports elections based recovery, a meta-model which reflects the system state, and the application of rules on top of the meta-model to achieve manager coordination.

Available as: