Web service access to data streams for data driven applications
The use of real-time streaming data in scientific computations is growing in prevalence as greater amounts of data are generated by sensors and instruments and more systems are monitored in real time.
This project enables data driven grid application services to retrieve and operate on data flowing in real time from a data stream. Our prototype continuous query grid service, Calder, enables application web services to submit long running continuously executing queries to the system.
A query can aggregate, filter, and transform one or more data streams on behalf of the application, generating a new stream tailored to the needs of the application service. Calder buffers the resulting stream to, enabling temporal synchronization between the stream and the application service.
The Calder system operates over a realistic stream load generated by a computational science application so provides a realistic framework in which to investigate a number of timely research issues in stream query processing:
- query distribution that is sensitive to metrics such as minimized global network bandwidth consumption
- approximation of query results under conditions of stream bursts
- stream discovery
- temporal and spatial aggregation operators that minimize CPU and network bandwidth consumption.
Calder extends dQUOB (http://www.cs.indiana.edu/dde/projects/dquob.html). dQUOB v1.0, which is available for release, includes the stream processing system components of Calder.
Calder has two subsystems:
1. Data management subsystem - comprising of the grid data service, planning service and the rowset service. A registry service and provenance service are currently under development.
2. Query processing subsystem or the Query Processing Engine (QPE).
Base Calder functionality for query planning and distribution, discovery, service interface, and optimization will be available as part of Calderv1.0 and beyond. Calder v1.0 is slated for release December 2005.
'Calder' is an old English word with Celtic/Gaelic origins meaning 'streams'.
- Ying Liu, Nithya N. Vijayakumar, Beth Plale, Stream Processing in Data-driven Computational ScienceTo appear 7th IEEE/ACM International Conference on Grid Computing (Grid'06), Barcelona, September 2006 (18% acceptance rate)
- Nithya N. Vijayakumar, Ying Liu, Beth Plale, Calder Query Grid Service: Insights and Experimental Evaluation Short Paper, IEEE Cluster Computing and Grid (CCGrid), May 2006.
- Nithya Vijayakumar, Ying Liu, Beth Plale, Calder: Enabling Grid Access to Data Streams, Poster, HPDC, July 2005.
- Ying Liu, Beth Plale, Nithya Vijayakumar Distributed Streaming Query Planner in Calder System , Poster, HPDC, July 2005.