Papers by Author | in ( Plenary | Parallel | Poster | Summary ) Sessions

The Nile Fast-Track Implementation: Fault-Tolerant Parallel Processing of Legacy CLEO Data

            The Nile Collboration (affiliated with CLEO)

Kenneth Birman, David Cassel, Ray Helmke, David Kreinick, Dan Riley,
                         Mark Rondinaro
             Cornell University, Ithaca NY

                   Andy Calkins, Keith Marzullo
            University of California at San Diego

       Michael Athanas, Paul Avery, Wade Hong, Theodore Johnson
              University of Florida, Gainesville

        Chun Chan, Michael Ogg, Aleta Ricciardi, Eric Rothfus
                University of Texas at Austin
  • Paper (Postscript)
  • Paper (PDF)

  • Slides
                    Abstract
    
    NILE is a multi-disciplinary project building a distributed computing environment for HEP. Nile will provide fault-tolerant, integrated access to processing and data resources for collaborators of the CLEO experiment, though the goals and principles are applicable to many domains. Nile currently has three main objectives: a realistic distributed system architecture design, the design of a robust data model, and a Fast-Track implementation providing a prototype design environment which will also be used by CLEO physicists. This talk will focus on the Fast-Track implementation.

    The goal of the Fast-Track implementation is to provide a fault-tolerant local-area analysis system for CLEO data, compatible with the pre-existing CLEO data format and analysis codes. The Fast-Track system implements parts of the local-area job management features of the Nile system architecture, providing a test-bed for design ideas and an opportunity to gain early real-world experience with local-area job scheduling and management issues. The Fast-Track will not incorporate the full Data Model. However, we are investigating using other data storage and access methods, such as persistent object stores (e.g. Ptool++) and CORBA-compliant toolkits. This testbed will provide valuable information that is needed for the Data Model design. In terms of the overall Nile system architecture, the Fast-Track includes prototype implementations of the Provider and the Local Resource Manager:

  • Provider. The Provider is the representative of an individual processor node. It is responsible for providing machine-specific services to the Local Resource Manager, including reporting the local machine state and implementing the execution service which runs user data analysis jobs in the Nile system.
  • Local Resource Manager. The LRM is responsible for monitoring, allocating, and coordinating resources at a geographic locality. It maintains complete information about local processors and data use. A user submits a job to the LRM, which decides where the job can be processed. The LRM breaks down the job and allocates it to processors, exploiting both parallel execution and data locality.

    The Fast-Track is being implemented using the ISIS toolkit. Our talk will discuss the design of the Fast-Track system, design issues arising from the requirements of backwards compatibility with existing CLEO code and data, performance of the Fast-Track implementation compared to the existing CLEO data analysis system, and the contributions of the Fast-Track implementation to the ultimate Nile system architecture.


    Speaker: Michael Athanas
    Contact: Michael Ogg
    University of Texas at Austin
    Department of Electrical and Computer Engineering, C0803
    Austin, TX 78712-1084
    
    ogg@ece.utexas.edu
    tel: 512-471-2328
    fax: 512-471-5532
    
    The Nile Collboration (affiliated with CLEO)
    
    Kenneth Birman, David Cassel, Ray Helmke, David Kreinick, Dan Riley,
    Mark Rondinaro (Cornell University, Ithaca NY)
    Andy Calkins, Keith Marzullo (University of California at San Diego)
    Michael Athanas, Paul Avery, Wade Hong, Theodore Johnson (University of
    Florida, Gainesville)
    Chun Chan, Michael Ogg, Aleta Ricciardi, Eric Rothfus (University of Texas
    at Austin)