Yuan Luo [骆远]

Ph.D. Candidate in Computer Science
K. Jon Barwise Fellow
School of Informatics and Computing
Indiana University

Research Assistant
Data to Insight Center
Indiana University

IU Innovation Center 131
2719 East 10th Street
Bloomington, IN, 47408, USA

View Yuan Luo's profile on LinkedIn

Office:      (Area Code)855-8305
Cellphone:(Area Code)272-0208
Area Code = 812

http://www.yuanluo.net


[ Curriculum Vitae ] [ Biography ] [ Research Interests ] [ Teaching ] [ Publications ] [ Presentations ] [ Projects ] [ Services ] [ Calendar ]


Biography[Top]

Yuan Luo is a Computer Science Ph.D candidate and recipient of the K. Jon Barwise Fellowship in School of Informatics and Computing at Indiana University. His research committee members are Prof. Beth Plale, Prof. Geoffrey Fox, Prof. Judy Qiu, and Prof. Yuqing Wu. He works as a Research Assistant in Data to Insight Center at Indiana University. He was a Research Intern at IBM T. J. Watson Research Center in 2012, an intern in the Center for Research in Biological Systems (CRBS) at UCSD in 2009. He was member of Extreme Computing Lab at IUCS under Dr. Dennis Gannon. Yuan Luo received his BS and MS degree in computer science from Jilin University in 2005 and 2008 respectively. He was a visiting scholar of University of California, San Diego. From 2005, Yuan Luo has been actively involved in PRAGMA (Pacific Rim Applications and Grid Middleware Assembly). He served as Poster Session Chair of the 23rd and 24th PRAGMA Workshop, and Program Committee member of PRAGMA24. He is a co-founder and steering committee member of PRAGMA Students. He was instructor of National Biomedical Computation Resource (NBCR) Summer Institute in 2006 and 2009.


Research Interests[Top]

Grid Computing, Cloud Computing, Data Intensive Distributed Computing,
Web Services and Workflows, Cyberinfrastructure, Data Provenance, etc.


Teaching [Top]

Spring 2010:
CSCI B534: Distributed Systems (Meets with CSCI B490: Seminar in Computer Science)
Tue and Thur 5:30pm-6:45pm, Informatics East Room 130
Office Hour: Thursday 2:30pm-3:50pm at LH301H

Fall 2009:
CSCI A110: Introduction to Computers and Computing, Undergraduate Course
CSCI B503: Algorithms Design and Analysis, Graduate Course
Office Hour: By appointment.


Projects[Top]

* Hierarchical MapReduce

Role: Project Creator, Principal Designer and Developer

MapReduce is a model for processing huge datasets on embarrassingly parallel applications using a large number of compute resources. But typical MapReduce frameworks are limited to scheduling jobs to run within a single cluster. However, a single cluster would not be easy to scale, and the input dataset could be widely distributed across multiple clusters. We extend the MapReduce framework to a hierarchical framework that gathers computation resources from different clusters and run MapReduce jobs across them. The applications implemented in this framework adopt the "Map-Reduce-Global Reduce" model where computations are expressed as three functions: Map, Reduce, and Global Reduce. The global controller in our framework splits the data set and maps them onto multiple "local" MapReduce clusters to run map and reduce functions, and the local results are returned back to the global controller to run the Global Reduce function.

See Project Page Hierarchical MapReduce for more information.

* Karma Provenance Collection Tool

Role: Messaging System Designer, Core Developer

Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data. Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set. Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool. The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.

See Karma Provenance Collection Tool for more information.

* NASA-InstantKarma

Role: Core Developer

The project will improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community. It will customize and integrate Karma, a proven provenance tool into NASA data production by collecting and disseminating provenance of Advanced Microwave Scanning Radiometer - Earth Observing (AMSR-E) standard data products, intially focusing on Sea Ice. The plan is to engage the Sea Ice science team and user community and adhere to the Open Provenance Model (OPM).

See InstantKarma for more information.

* GENI-NetKarma

Role: Core Developer

The GENI Provenance Registry (NetKarma) project, funded in October 2009, provides a tool for capturing the workflow of GENI slice creation, topology of the slice, operational status and other measurement statistics and correlate it with the experimental data. The tool, NetKarma, allows researchers to see the exact state of the network and store configuration of the experiment and its slice. The provenance of the data will be stored and visualized through a data portal. The provenance data can be used by the researcher to analyze their data, allow for the suspension and resumption of an experiment and provide a single reference to find the details and data collected in an experiment. NetKarma is based on the Karma provenance architecture that has been used to collect scientific workflows in diverse domains such as meterology and life science.

See NetKarma for more information.

* PRAGMA Cloud

Role: Technical Lead and principal developer at Indiana University

IU Servers part of PRAGMA Cloud, IU@PRAGMA

* Linked Environments for Atmospheric Discovery (LEAD)

Role: Experiment Builder Developer

Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. The LEAD Portal brings together all the necessary resources at one convenient access point, supported by high-performance computing systems. With LEAD, meteorologists, researchers, educators, and students are no longer passive bystanders or limited to static data or pre-generated images, but rather they are active participants who can acquire and process their own data. LEAD software enhances the experimental process by automating many of the time consuming and complicated tasks associated with meteorological science. The "workflow" tool links data management, assimilation, forecasting, and verification applications into a single experiment. The experiment's output also includes detailed descriptions of the product, also called "metadata."

See LEAD Portal for more information.

* Opal Toolkit

Role: Job Manager (CSF4 Meta-scheduler, and Sigiri Job Manager) Developer

The Grid-based infrastructure enables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, Grid resources are not very easy to use for the end-users who have to learn how to generate security credentials, stage inputs and outputs, access Grid-based schedulers, and install complex client software. There is an imminent need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access Grid resources can be quite complicated, especially if it has to be replicated for every application. Towards that end, we have implemented Opal, which is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based Grid security and data management in an easy-to-use and configurable manner.

See Opal Website for more information.

* Community Scheduler Framework 4 (CSF4) Meta-scheduler & CSF4 Portlet (Since 2004)

Role: CSF4 Developer, CSF Portlet Designer/Developer

Community Scheduler Framework 4 (CSF4) is the first WSRF compliant community meta-scheduler, and released as an execution management service of Globus Toolkit 4. Using CSF4, the users can work with different local job schedulers, such as LSF, PBS, Condor and SGE, which may belong to different domains. CSF4 Portlet, first carried out in 2006 through the collaboration between Jilin University and University of California, San Diego (UCSD), is a java based web application for dispatching jobs to remote job schedulers, through a web browser, without understanding the underlying Grid services.

The source code is available at SourceForge and JLU Grid Team

* Avian Flu Grid (Since March 2007)

Role: PRAGMA Portal Developer, CSF4 Developer, CSF Portlet Designer/Developer

This project aims to use the grid and high performance computing infrastructure to develop a model for global collaboration in the fight against the pandemic threat of avian flu and other emerging infectious diseases. Through a global partnership forged over the PRAGMA grid development activities, we now aim to build a scalable, global, and open knowledge environment for developing novel inhibitors to avian flu.

The Avian Flu Grid is an integrative effort based on the technology developed by several member institutes to support advanced scientific research for avian flu. The calculation based on these state-of-the-art computational approaches is managed by the CSF4 meta-scheduler through either PRAGMA Portal or Opal-based application specific web services which leverages CSF4 for job distribution.

My work is to support the scheduling of multiple clusters (CSF4) to distribute jobs transparently at multiple sites around the region.


Publications[Top]

Journals

  • Hongliang Li, Xiaohui Wei, Qingwu Fu, Yuan Luo. (2013) MapReduce Delay Scheduling with Deadline Constraint, Concurrency and Computation: Practice and Experience, [DOI] (Impact Factor: 0.636)
  • Scott Jensen, Beth Plale, Mehmet Aktas, Yuan Luo, Peng Chen, and Helen Conover. Provenance Capture and Use in a Satellite Data Processing Pipeline, IEEE Transactions on Geoscience and Remote Sensing, accepted, 2013 (Impact Factor: 2.895)
  • Yuan Luo, Beth Plale, Zhenhua Guo, Wilfred Li, Judy Qiu, Yiming Sun. (2012) Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Concurrency and Computation: Practice and Experience [DOI] (Impact Factor: 0.636)
  • Ding, Z.; Wei, X.; Luo, Y.; Ma D; Li, W. W.; Arzberger, P. W., Customized Plug-in Modules in Metascheduler CSF4 for Life Sciences Applications, New Generation Computing, Vol.25 No.4 2007. [pdf][DOI] (Impact Factor: 0.941)
  • Ding, Z.; Wei, X.; Luo, Y.; et al. A Virtual Job Model to Support Cross-Domain Synchronized Resource Allocation, Journal of Jilin University (Science Edition), Vol. 46 No.2, Mar 26, 2008. (In Chinese with English Abstract). [pdf]
  • Conferences/Workshops

  • Peng Chen, Beth Plale, You-Wei Cheah, Devarshi Ghoshal, Scott Jensen, and Yuan Luo. Visualization of Network Data Provenance, Workshop on Massive Data Analytics on Scalable Systems, co-located with High Performance Computing Conference (HiPC), Pune, India, December 18th - 21st, 2012.
  • Plale, B., Withana, E. C., Herath, C., Chandrasekar, K., Luo, Y. Effectiveness of Hybrid Workflow Systems for Computational Science, International Conference on Computational Science (ICCS), Omaha, Nebraska, Jun 4-6, 2012. [DOI]
  • Luo, Y. and Plale, B. Hierarchical MapReduce Programming Model and Scheduling Algorithms, In Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), Ottawa, Canada, May 13-16, 2012. [DOI][pdf]
  • Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu, Wilfred Li. 2011. A Hierarchical Framework for Cross-Domain MapReduce Execution. In Proceedings of the second international workshop on Emerging computational methods for the life sciences (ECMLS '11). ACM, New York, NY, USA, 15-22. DOI=10.1145/1996023.1996026 [pdf][DOI][ECMLS2011 Workshop talk in HPDC]
  • Xiaohui Wei, Yuan Luo, Jishan Gao, et al. The Session Based Fault Tolerance Algorithm of Platform EGO Web Service Gateway, Proceedings of International Symposium on Grid Computing (ISGC2007), Academia Sinica, Taipei, Taiwan, March 26-29, 2007.[pdf][DOI]
  • Ding, Z.; Luo, Y.; Wei, X.; Misleh, C.; Li, W. W.; Arzberger, P. W.; Tatebe, O. My WorkSphere: Integrative Work Environment for Grid-unaware Biomedical Researchers and Applications, Proceedings of 2nd Grid Computing Environment Workshop, Supercomputing Conference 2006(SC06), Tampa, Florida, 2006.[pdf][RIT Digital Media Library]
  • Posters

  • Vortex2 Metadata Management on PRAGMA Cloud: A GeoPortal Experience. PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012.
  • A Hierarchical MapReduce Framework, PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012.
  • Improving Twister Messaging System Using Apache Avro, The 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2010), Indianapolis, USA, Nov 30 - Dec 3. 2010.[link] [CloudCom Abstract]
  • Karma: Provenance Aggregation Across Layers of GENI Experimental Networks, PRAGMA 19 Workshop, Changchun, China, Sept 13-15. 2010.[link]
  • GDIA: A Scalable Grid Infrastructure for Data Intensive Applications, National Biomedical Computation Resource Summer Institute 06, San Diego, Aug. 2006.[link]
  • My WorkSphere: Integrated and Transparent Access to Gfarm Computational Data Grid through GridSphere Portal with Metascheduler CSF4, 3rd International Life Sciences Grid Workshop, Yokohama, Japan, 2006; Yokohama, Japan, 2006. [pdf]

  • Presentations (excludes paper presentation)[Top]

  • Introduction to MapReduce and Hierarchical MapReduce, Guest Lecture in Scientific Data Management and Preservation Class (CSCI-B669), Indiana University, April 10, 2013
  • A Hierarchical MapReduce Framework, Invited talk at IBM Student Workshop for Frontiers of Cloud Computing 2012, IBM's Thomas J. Watson Research Center in Hawthorne, New York, July 30-31, 2012
  • Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Invited talk at Cloud Computing Lecture, Indiana University, Oct 12, 2011.
  • Opal-Sigiri: Software as a Service on PRAGMA Testbed, PRAGMA 20 Workshop, Hong Kong, China, March 2-4. 2011. [Slides]
  • Metascheduling using the Community Scheduler Framework (CSF4), NBCR Summer Institute 2009, UCSD, Aug 3-7th 2009. [Slides]
  • Software as a Service (SaaS) for Drug Discovery Workflows, with Wilfred W. Li, Sriram Krishnan, Jane Ren, Luca Clement, Kevin Dong, at UCSD, June 10th 2009.
  • My WorkSphere: Integrated and Transparent Access to Gfarm Computational Data Grid through GridSphere Portal with Meta-scheduler CSF4, NBCR Special Seminar, UCSD, Aug 28th 2006
  • Cluster and Grid Computing: Transparent Access and workflow management, NBCR Summer Institute 2006, UCSD, Aug 7-11th 2006

  • Services[Top]

  • Poster Session Chair and Students Workshop Chair, The 24rd Workshop of the Pacific Rim Application and Grid Middleware Assembly (PRAGMA24), Bangkok, Thailand, 03/2013
  • Poster Session Chair, The 23rd Workshop of the Pacific Rim Application and Grid Middleware Assembly (PRAGMA23), Seoul, Korea, 10/2012
  • Journal Reviewer, IEEE Systems Journal, 2013
  • Journal Reviewer, Scalable Computing, 2013
  • External Reviewer, The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Ottawa, Canada, May 2012
  • Journal Reviewer, Concurrency and Computation: Practice and Experience 2010-present
  • Co-founder of PRAGMA Students Steering Committee, PRAGMA Students. 04/2012 - present
  • Volunteer, The 2nd IEEE International Conference on Cloud Computing Technology and Science, Indianapolis, IN, USA, 12/2010

  • Last updated: April 9, 2013

    Hit Counter visits

    trace an ip