Yuan Luo [骆远]
IU Innovation Center 131
Office: (Area Code)855-8305
Yuan Luo is a Computer Science Ph.D candidate and recipient of the K. Jon Barwise Fellowship in School of Informatics and Computing at Indiana University. His research committee members are Prof. Beth Plale, Prof. Geoffrey Fox, Prof. Judy Qiu, and Prof. Yuqing Wu. He works as a Research Assistant in Data to Insight Center at Indiana University. He was a Research Intern at IBM T. J. Watson Research Center in 2012, an intern in the Center for Research in Biological Systems (CRBS) at UCSD in 2009. He was member of Extreme Computing Lab at IUCS under Dr. Dennis Gannon. Yuan Luo received his BS and MS degree in computer science from Jilin University in 2005 and 2008 respectively. He was a visiting scholar of University of California, San Diego. From 2005, Yuan Luo has been actively involved in PRAGMA (Pacific Rim Applications and Grid Middleware Assembly). He served as Poster Session Chair of the 23rd and 24th PRAGMA Workshop, and Program Committee member of PRAGMA24. He is a co-founder and steering committee member of PRAGMA Students. He was instructor of National Biomedical Computation Resource (NBCR) Summer Institute in 2006 and 2009.
Grid Computing, Cloud Computing, Data Intensive Distributed Computing,
* Hierarchical MapReduce
Role: Project Creator, Principal Designer and Developer
MapReduce is a model for processing huge datasets on embarrassingly parallel applications using a large number of compute resources. But typical MapReduce frameworks are limited to scheduling jobs to run within a single cluster. However, a single cluster would not be easy to scale, and the input dataset could be widely distributed across multiple clusters. We extend the MapReduce framework to a hierarchical framework that gathers computation resources from different clusters and run MapReduce jobs across them. The applications implemented in this framework adopt the "Map-Reduce-Global Reduce" model where computations are expressed as three functions: Map, Reduce, and Global Reduce. The global controller in our framework splits the data set and maps them onto multiple "local" MapReduce clusters to run map and reduce functions, and the local results are returned back to the global controller to run the Global Reduce function.
See Project Page Hierarchical MapReduce for more information.
* Karma Provenance Collection Tool
Role: Messaging System Designer, Core Developer
Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data. Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set. Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool. The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.
See Karma Provenance Collection Tool for more information.
Role: Core Developer
The project will improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community. It will customize and integrate Karma, a proven provenance tool into NASA data production by collecting and disseminating provenance of Advanced Microwave Scanning Radiometer - Earth Observing (AMSR-E) standard data products, intially focusing on Sea Ice. The plan is to engage the Sea Ice science team and user community and adhere to the Open Provenance Model (OPM).
See InstantKarma for more information.
Role: Core Developer
The GENI Provenance Registry (NetKarma) project, funded in October 2009, provides a tool for capturing the workflow of GENI slice creation, topology of the slice, operational status and other measurement statistics and correlate it with the experimental data. The tool, NetKarma, allows researchers to see the exact state of the network and store configuration of the experiment and its slice. The provenance of the data will be stored and visualized through a data portal. The provenance data can be used by the researcher to analyze their data, allow for the suspension and resumption of an experiment and provide a single reference to find the details and data collected in an experiment. NetKarma is based on the Karma provenance architecture that has been used to collect scientific workflows in diverse domains such as meterology and life science.
See NetKarma for more information.
* PRAGMA Cloud
Role: Technical Lead and principal developer at Indiana University
* Linked Environments for Atmospheric Discovery (LEAD)
Role: Experiment Builder Developer
Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. The LEAD Portal brings together all the necessary resources at one convenient access point, supported by high-performance computing systems. With LEAD, meteorologists, researchers, educators, and students are no longer passive bystanders or limited to static data or pre-generated images, but rather they are active participants who can acquire and process their own data. LEAD software enhances the experimental process by automating many of the time consuming and complicated tasks associated with meteorological science. The "workflow" tool links data management, assimilation, forecasting, and verification applications into a single experiment. The experiment's output also includes detailed descriptions of the product, also called "metadata."
See LEAD Portal for more information.
* Opal Toolkit
Role: Job Manager (CSF4 Meta-scheduler, and Sigiri Job Manager) Developer
The Grid-based infrastructure enables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, Grid resources are not very easy to use for the end-users who have to learn how to generate security credentials, stage inputs and outputs, access Grid-based schedulers, and install complex client software. There is an imminent need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access Grid resources can be quite complicated, especially if it has to be replicated for every application. Towards that end, we have implemented Opal, which is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based Grid security and data management in an easy-to-use and configurable manner.
See Opal Website for more information.
* Community Scheduler Framework 4 (CSF4) Meta-scheduler & CSF4 Portlet (Since 2004)
Role: CSF4 Developer, CSF Portlet Designer/Developer
Community Scheduler Framework 4 (CSF4) is the first WSRF compliant community meta-scheduler, and released as an execution management service of Globus Toolkit 4. Using CSF4, the users can work with different local job schedulers, such as LSF, PBS, Condor and SGE, which may belong to different domains. CSF4 Portlet, first carried out in 2006 through the collaboration between Jilin University and University of California, San Diego (UCSD), is a java based web application for dispatching jobs to remote job schedulers, through a web browser, without understanding the underlying Grid services.
* Avian Flu Grid (Since March 2007)
Role: PRAGMA Portal Developer, CSF4 Developer, CSF Portlet Designer/Developer
This project aims to use the grid and high performance computing infrastructure to develop a model for global collaboration in the fight against the pandemic threat of avian flu and other emerging infectious diseases. Through a global partnership forged over the PRAGMA grid development activities, we now aim to build a scalable, global, and open knowledge environment for developing novel inhibitors to avian flu.
The Avian Flu Grid is an integrative effort based on the technology developed by several member institutes to support advanced scientific research for avian flu. The calculation based on these state-of-the-art computational approaches is managed by the CSF4 meta-scheduler through either PRAGMA Portal or Opal-based application specific web services which leverages CSF4 for job distribution.
My work is to support the
scheduling of multiple clusters (CSF4) to distribute jobs transparently at multiple sites around the region.
Presentations (excludes paper presentation)[Top]