Papers by Author | in ( Plenary | Parallel | Poster | Summary ) Sessions

Scalable Digital Libraries of Event Data

Stuart Bailey, Robert Grossman, and David Hanley[1]
            Don Benton and Bob Hollebeek[2]


      [1]University of Illinois at Chicago
         [2 University of Pennsylvania
       National Scalable Cluster Project (NSCP)

  • Paper (Postscript)
  • Paper (PDF)
                    
                    Abstract
    
    In this paper we present the design, implementation, and experimental results of a system to mine and visualize event data using cluster computing built upon an ATM network. Our approach is to build a system using light weight, modular software tools for data management, resource management, data analysis and visualization developed for local, campus and wide area clusters of workstations.

    This work was done as part of the National Scalable Cluster Project whose goal is to develop algorithms, software and model applications exploiting high performance broad band networking in support of local and wide area clusters of workstations and high performance computers. Currently, the collaboration involves UIC, UMD, and PENN, as well as IBM, Xerox and other corporate sponsors. The project is in its initial phase and consists of campus ATM clusters at UIC and PENN as well as an IBM SP-2 at PENN. In the next phase, clusters will be added to UMD and all three clusters will connected to form a meta-cluster using an ATM cloud. The meta-cluster will be operated as shared resource.

    The architectural design of the system is based upon the following ideas: Light Weight Object Management. We employ an underlying object data model for the event data and manage the events with a light weight persistent object manager instead of full functioned object oriented database. In essence, we exploit the fact that HEP event data is essentially read-only to trade functionality for performance. For this project, a version of PTool was developed optimized for clusters of workstations. Formal Modular Components. We are beginning to experiment with using formal methods to define the interfaces between different components in our system. For example, the APIs provided by light weight object managers may be all that is needed for some data analysis and visualization applications, while others may require the additional functionality provided by a CORBA interface. Formal methods provide an easy way for different components to access the underlying persistent object store through a variety of interfaces.

    Resource Management. Jobs submitted to the meta-cluster are provided the necessary nodes and storage resources using resource management software developed by Platform Computing.

    In the paper, we describe our experimental results which use PTool to create persistent object stores of HEP data on a cluster of Unix workstations connected with an ATM switch. We compare queries on this cluster to queries on the high performance cluster provided by the PENN IBM SP-2. To improve performance, we striped the HEP data across the cluster. To visualize the data we used Histoscope, developed at Fermi National Accelerator Laboratory, and a C++ interface for Histoscope developed at PENN. We verified that we were able to analyze Gigabytes of event data without performance degradation and that striping improved performance linearly up to the bandwidth of the equipment.


    Submitter's Name: Robert Grossman
    Submitter's Institution: University of Illinois at Chicago
    Address of Institution: Laboratory for Advanced Computing (M/C 249)
                            851 S. Morgan Street
                            Chicago, IL 60607
    
    Submitter's EMAIL address: grossmanuic.edu
    Submitter's telephone number: 312 413 2176
    Fax number: 312 996 1491