Yogesh L. Simmhan: Publication List. Generated on 2007-06-07.
Journal/Professional Magazine
pdf
Karma2: Provenance Management for Data Driven Workflows
Simmhan, Y.L.; Plale, B. & Gannon, D.
International Journal of Web Services Research, Idea Group Publishing
,
Vol. 5
,
pp. 1
,
2008
To Appear.
Abstract. The increasing ability for the sciences to sense the world around us is resulting in a growing need for data driven applications that are under the control of workflows composed of services on the Grid. The focus of our work is on provenance collection for these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework is based on generating discrete provenance activities during the lifecycle of a workflow execution that can be aggregated to form complex data and process provenance graphs that can span across workflows. The implementation uses a loosely-coupled publish-subscribe architecture for propagating these activities and the capabilities of the system satisfies the needs of detailed provenance collection. A performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products).
Journal/Professional Magazine
pdf
Query capabilities of the Karma provenance framework
Simmhan, Y.L.; Plale, B. & Gannon, D.
Concurrency and Computation: Practice and Experience, Wiley InterScience
,
2007
In Press
Abstract. Provenance metadata in e-Science captures the derivation history of data products generated from scientific workflows. Provenance forms a glue linking workflow execution with associated data products, and finds use in determining the quality of derived data, tracking resource usage, and for verifying and validating scientific experiments. In this article, we discuss the scope of provenance collected in the Karma provenance framework used in the LEAD Cyberinfrastructure project, distinguishing provenance metadata from generic annotations. We further describe our approaches to querying for different forms of provenance in Karma in the context of queries in the first provenance challenge. We use an incremental, building-block method to construct provenance queries based on the fundamental querying capabilities provided by the Karma service centered on the provenance data model. This has the advantage of keeping the Karma service generic and simple, and yet supports a wide range of queries. Karma successfully answers all but one challenge query.
Conference/Workshop ramakrishnan2007Realization
pdf
Realization of Dynamically Adaptive Weather Analysis and Forecasting in LEAD
Ramakrishnan, L.; Simmhan, Y. & Plale, B.
Dynamic Data Driven Applications Systems Workshop (DDDAS) in conjunction with ICCS
,
2007
Invited
Abstract. Linked Environments for Atmospheric Discovery (LEAD) is a large-scale cyberinfrastructure effort in support of mesoscale meteorology. One of the primary goals of the infrastructure is support for real-time dynamic, adaptive response to severe weather. Specifically, the service framework must be able to respond to weather conditions by detecting the condition in the first place, then directing and allocating resources to collect more information about the weather and generate forecasts. In this paper we revisit the conception of dynamic adaptivity as was presented in our 2005 DDDAS workshop paper [1], and discuss changes since the original conceptualization, and the lessons we have learned to date in working with a complex service oriented architecture in support of data driven science.
Journal/Professional Magazine
pdf
The First Provenance Challenge
Moreau, L.; Lud¨ascher, B.; Altintas, I.; Barga, R.S.; Bowers, S.; Callahan, S.; Jr., G.C.; Clifford, B.; Cohen, S.; Cohen-Boulakia, S.; Davidson, S.; Deelman, E.; Digiampietri, L.; Foster, I.; Freire, J.; Frew, J.; Futrelle, J.; Gibson, T.; Gil, Y.; Goble, C.; Golbeck, J.; Groth, P.; Holland, D.A.; Jiang, S.; Kim, J.; Koop, D.; Krenek, A.; McPhillips, T.; Mehta, G.; Miles, S.; Metzger, D.; Munroe, S.; Myers, J.; Plale, B.; Podhorszki, N.; Ratnakar, V.; Santos, E.; Scheidegger, C.; Schuchardt, K.; Seltzer, M.; Simmhan, Y.L.; Silva, C.; Slaughter, P.; Stephan, E.; Stevens, R.; Turi, D.; Vo, H.; Wilde, M.; Zhao, J. & Zhao, Y.
Concurrency and Computation: Practice and Experience, Wiley InterScience
,
2007
In Press
Abstract. The first Provenance Challenge was set up in order to provide a forum for the community to understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order to produce some provenance representation, from which a set of identified queries had to be implemented and executed. Sixteen teams responded to the challenge, and submitted their inputs. In this paper, we present the challenge workflow and queries, and summarise the participants contributions.
Book Chapter
pdf
Dynamic, Adaptive Workflows for Mesoscale Meteorology
Gannon, D.; Plale, B.; Marru, S.; Kandaswamy, G.; Simmhan, Y. & Shirasuna, S.
In Book: Workflows for eScience: Scientific Workflows for Grids
Eds: Gannon, D.; Deelman, E.; Shields, M. & Taylor, I.
Springer-Verlag
,
2007
In Press
Book Chapter
pdf
Building Grid Portals for e-Science: A Service Oriented Architecture
Gannon, D.; Plale, B.; Christie, M.; Huang, Y.; Jensen, S.; Liu, N.; Marru, S.; Pallickara, S.L.; Perera, S.; Shirasuna, S.; Simmhan, Y.; Slominski, A.; Sun, Y. & Vijayakumar, N.
In Book: High Performance Computing and Grids in Action
Eds: Grandinetti, L.
IOS Press
,
2007
To Appear
Abstract. Grids are built by communities who need a shared cyberinfrastructure to make progress on the critical problems they are currently confronting. A Grid portal is a conventional Web portal that sits on top of a rich collection of web services that allow a community of users access to shared data and application resources without exposing them to the details of Grid computing. In this chapter we describe a service-oriented architecture to support this type of portal.
Conference/Workshop simmhan2006Performance
pdf
Performance Evaluation of the Karma Provenance Framework for Scientific Workflows
Simmhan, Y.L.; Plale, B.; Gannon, D. & Marru, S.
International Provenance and Annotation Workshop (IPAW), Chicago, IL
,
Vol. 4145
,
pp. 222-236
,
2006
Abstract. Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientific workflows and is used in the Linked Environments for Atmospheric Discovery (LEAD) project. This paper presents a performance analysis of the Karma service as compared against the contemporary PReServ provenance service. Our study finds that Karma scales exceedingly well for collecting and querying provenance records, showing linear or sub-linear scaling with increasing number of provenance records and clients when tested against workloads in the order of 10,000 application-service invocations and over 36 concurrent clients.
Conference/Workshop simmhan2006Framework
pdf
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
Simmhan, Y.L.; Plale, B. & Gannon, D.
IEEE International Conference on Web Services (ICWS), Chicago, IL
,
2006
(18% acceptance)
Abstract. The increasing ability for the earth sciences to sense the world around us is resulting in a growing need for data-driven applications that are under the control of data-centric workflows composed of grid- and web- services. The focus of our work is on provenance collection for these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products).
Technical Report
Resource Catalog: An Information Service for Community Resources in LEAD
Simmhan, Y.L.; Plale, B. & Gannon, D.
Technical report 002
,
Linked Environments for Atmospheric Discovery
,
2006
In progress
Conference/Workshop simmhan2006Towards
pdf
Towards a Quality Model for Effective Data Selection in Collaboratories
Simmhan, Y.L.; Plale, B. & Gannon, D.
IEEE International Workshop on Workflow and Data Flow for Scientific Applications (SciFlow) in conjunction with ICDE, Atlanta, GA
,
2006
Abstract. Data-driven scientific applications utilize workflow frameworks to execute complex dataflows, resulting in derived data products of unknown quality. We discuss our on-going research on a quality model that provides users with an integrated estimate of the data quality that is tuned to their application needs, and is available as a numerical quality score that enables uniform comparison of datasets, and increases community’s trust in derived data.
Conference/Workshop simmhan2006Data
pdf
Data Management in Dynamic Environment-driven Computational Science
Simmhan, Y.L.; Pallickara, S.L.; Vijayakumar, N.N. & Plale, B.
IFIP Working Conference on Grid-Based Problem Solving Environments (WoCo9), Prescott, AZ
,
2006
To appear in Springer Lecture Notes in Computer Science (LNCS).
Abstract. Advances in numerical modeling, computational hardware, and problem solving environments have driven the growth of computational science over the past decades. Science gateways, based on service oriented architectures and scientific workflows, provide yet another step in democratizing access to advanced numerical and scientific tools, computational resource and massive data storage, and fostering collaborations. Dynamic, data-driven applications, such as those found in weather forecasting, present interesting challenges to Science Gateways, which are being addresses as part of the LEAD Cyberinfrastructure project. In this article, we discuss three important data related problems faced by such adaptive data-driven environments: managing a user's personal workspace and metadata on the Grid, tracking the provenance of scientific workflows and data products, and continuous data mining over observational weather data.
Technical Report
pdf
A Survey of Data Provenance Techniques
Simmhan, Y.L.; Plale, B. & Gannon, D.
Technical report 612
,
Computer Science Department, Indiana University
,
2005
Abstract. Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.
Journal/Professional Magazine
pdf
A Survey of Data Provenance in e-Science
Simmhan, Y.; Plale, B. & Gannon, D.
SIGMOD Record
,
Vol. 34
,
No. 3
,
pp. 31-36
,
2005
Abstract. Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. In this paper we create a taxonomy of data provenance characteristics and apply it to current research efforts in e-science, focusing primarily on scientific workflow approaches. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. The survey culminates with an identification of open research problems in the field.
Conference/Workshop gannon2005Service
pdf
Service Oriented Architectures for Science Gateways on Grid Systems
Gannon, D.; Plale, B.; Christie, M.; Fang, L.; Huang, Y.; Jensen, S.; Kandaswamy, G.; Marru, S.; Pallickara, S.L.; Shirasuna, S.; Simmhan, Y.; Slominski, A. & Sun, Y.
International Conference on Service Oriented Computing (ICSOC), Amsterdam, Netherlands
,
Vol. 3826
,
pp. 21-32
,
2005
Abstract. Grid computing is about allocating distributed collections of resources including computers, storage systems, networks and instruments to form a coherent system devoted to a “virtual organization” of users who share a common interest in solving a complex problem or building an efficient agile enterprise. Service oriented architectures have emerged as the standard way to build Grids. This paper provides a brief look at the Open Grid Service Architecture, a standard being proposed by the Global Grid Forum, which provides the foundational concepts of most Grid systems. Above this Grid foundation is a layer of application-oriented services that are managed by workflow tools and “science gateway” portals that provide users transparent access to the applications that use the resources of a Grid. In this paper we will also describe these Gateway framework services and discuss how they relate to and use Grid services.
Journal/Professional Magazine
pdf
Building Grid Portal Applications from a Web-Service Component Architecture
Gannon, D.; Alameda, J.; Chipara, O.; Christie, M.; Dukle, V.; Fang, L.; Farellee, M.; Fox, G.; Hampton, S.; Kandaswamy, G.; Kodeboyina, D.; Moad, C.; Pierce, M.; Plale, B.; Rossi, A.; Simmhan, Y.; Sarangi, A.; Slominski, A.; Shirasauna, S. & Thomas, T.
Proceedings of the IEEE
,
Vol. 93
,
No. 3
,
pp. 551-563
,
2005
Abstract. This paper describes an approach to building Grid applications based on the premise that users who wish to access and run these applications prefer to do so without becoming experts on Grid technology. We describe an application architecture based on wrapping user applications and application workflows as web services and web service resources.These services are visible to the users and to resource providers through a family of Grid portal components that can be used to configure, launch and monitor complex applications in the scientific language of the end user. The applications in this model are instantiated by an application factory service. The layered design of the architecture makes it possible for an expert to configure an application factory service with a custom user interface client that may be dynamical loaded into the portal.
Conference/Workshop gannon2004Building
pdf
On Building Parallel & Grid Applications: Component Technology and Distributed Services
Gannon, D.; Krishnan, S.; Fang, L.; Kandaswamy, G.; Simmhan, Y. & Slominski, A.
IEEE Challenges of Large Applications in Distributed Environments (CLADE), Honolulu, HI
,
pp. 44
,
2004
Abstract. Software Component Frameworks are well known in the commercial business application world and now this technology is being explored with great interest as a way to build large-scale scientific application on parallel computers. In the case of Grid systems, the current architectural model is based on the emerging web services framework. In this paper we describe progress that has been made on the Common Component Architecture model (CCA) and discuss its success and limitations when applied to problems in Grid computing. Our primary conclusion is that a component model fits very well with a services-oriented Grid, but the model of composition must allow for a very dynamic (both in space and it time) control of composition. We note that this adds a new dimension to conventional service workflow and it extends the “Inversion of Control” aspects of must component systems.
Conference/Workshop gannon2003Building
pdf
Building Grid Services for User Portals
Gannon, D.; Christie, M.; Chipara, O.; Fang, L.; Farrellee, M.; Kandaswamy, G.; Lu, W.; Plale, B.; Slominski, A.; Sarangi, A. & Simmhan, Y.L.
GGF Workshop on Designing and Building Grid Services, Chicago, IL
,
2003
Technical Report
pdf
XEvents/XMessages: Application Events and Messaging Framework for Grid
Slominski, A.; Simmhan, Y.; Rossi, A.L.; Farrellee, M. & Gannon, D.
Technical report
,
Extreme! Computing Lab, Indiana University
,
2002
Journal/Professional Magazine
pdf
The XCAT Science Portal
Krishnan, S.; Bramley, R.; Gannon, D.; Ananthakrishnan, R.; Govindaraju, M.; Slominski, A.; Simmhan, Y.; Alameda, J.; Alkire, R.; Drews, T. & Webb, E.
Scientific Programming, IOS Press
,
Vol. 10
,
No. 4
,
pp. 303-317
,
2002
Abstract. This paper describes the design and prototype implementation of the XCAT Grid Science Portal. The portal lets grid application programmers script complex distributed computations and package these applications with simple interfaces for others to use. Each application is packaged as a notebook which consists of webpages and editable parameterized scripts. The portal is a workstation-based specialized personal web server, capable of executing the application scripts and launching remote grid applications for the user. The portal server can receive event streams published by the application and grid resource information published by Network Weather Service(NWS) or Autopilot sensors. Notebooks can be published and stored in web based archives for others to retrieve and modify. The XCAT Grid Science Portal has been tested with various applications, including the distributed simulation of chemical processes in semiconductor manufacturing and collaboratory support for X-ray crystallographers.
Journal/Professional Magazine
pdf
Programming the Grid: Distributed Software Components, P2P and Grid Web Services for Scientific Applications
Gannon, D.; Bramley, R.; Fox, G.; Smallen, S.; Rossi, A.; Ananthakrishnan, R.; Bertrand, F.; Chiu, K.; Farrellee, M.; Govindaraju, M.; Krishnan, S.; Ramakrishnan, L.; Simmhan, Y.; Slominski, A.; Ma, Y.; Olariu, C. & Rey-Cenvaz, N.
Cluster Computing, Springer
,
Vol. 5
,
No. 3
,
pp. 325-336
,
2002
Abstract. Computational Grids have become an important asset in large-scale scientific and engineering research. By providing a set of services that allow a widely distributed collection of resources to be tied together into a relatively seamless computing framework, teams of researchers can collaborate to solve problems that they could not have attempted before. Unfortunately the task of building Grid applications remains extremely difficult because there are few tools available to support developers. To build reliable and re-usable Grid applications, programmers must be equipped with a programming framework that hides the details of most Grid services and allows the developer a consistent, non-complex model in which applications can be composed from well tested, reliable sub-units. This paper describes experiences with using a software component framework for building Grid applications. The framework, which is based on the DOE Common Component Architecture (CCA), allows individual components to export function/service interfaces that can be remotely invoked by other components. The framework also provides a simple messaging/event system for asynchronous notification between application components. The paper also describes how the emerging Web-Services model fits with a component-oriented application design philosophy. To illustrate the connection between web services and Grid application programming we describe a simple design pattern for application factory services which can be used to simplify the task of building reliable Grid programs. Finally we address several issues of Grid programming that better understood from the perspective of Peer-to-Peer (P2P) systems. In particular we describe how models for collaboration and resource sharing fit well with many grid application scenarios.