Stuart Bailey, Robert Grossman, and David Hanley[1]
Don Benton and Bob Hollebeek[2]
[1]University of Illinois at Chicago
[2 University of Pennsylvania
National Scalable Cluster Project (NSCP)
Abstract
In this paper we present the design, implementation, and
experimental results of a system to mine and visualize event
data using cluster computing built upon an ATM
network. Our approach is to build a system using light weight,
modular software tools for data management, resource management,
data analysis and visualization developed
for local, campus and wide area clusters of workstations.
This work was done as part of the National Scalable Cluster Project whose goal is to develop algorithms, software and model applications exploiting high performance broad band networking in support of local and wide area clusters of workstations and high performance computers. Currently, the collaboration involves UIC, UMD, and PENN, as well as IBM, Xerox and other corporate sponsors. The project is in its initial phase and consists of campus ATM clusters at UIC and PENN as well as an IBM SP-2 at PENN. In the next phase, clusters will be added to UMD and all three clusters will connected to form a meta-cluster using an ATM cloud. The meta-cluster will be operated as shared resource.
The architectural design of the system is based upon the following ideas: Light Weight Object Management. We employ an underlying object data model for the event data and manage the events with a light weight persistent object manager instead of full functioned object oriented database. In essence, we exploit the fact that HEP event data is essentially read-only to trade functionality for performance. For this project, a version of PTool was developed optimized for clusters of workstations. Formal Modular Components. We are beginning to experiment with using formal methods to define the interfaces between different components in our system. For example, the APIs provided by light weight object managers may be all that is needed for some data analysis and visualization applications, while others may require the additional functionality provided by a CORBA interface. Formal methods provide an easy way for different components to access the underlying persistent object store through a variety of interfaces.
Resource Management. Jobs submitted to the meta-cluster are provided the necessary nodes and storage resources using resource management software developed by Platform Computing.
In the paper, we describe our experimental results which use PTool to create persistent object stores of HEP data on a cluster of Unix workstations connected with an ATM switch. We compare queries on this cluster to queries on the high performance cluster provided by the PENN IBM SP-2. To improve performance, we striped the HEP data across the cluster. To visualize the data we used Histoscope, developed at Fermi National Accelerator Laboratory, and a C++ interface for Histoscope developed at PENN. We verified that we were able to analyze Gigabytes of event data without performance degradation and that striping improved performance linearly up to the bandwidth of the equipment.
Submitter's Name: Robert Grossman
Submitter's Institution: University of Illinois at Chicago
Address of Institution: Laboratory for Advanced Computing (M/C 249)
851 S. Morgan Street
Chicago, IL 60607
Submitter's EMAIL address: grossmanuic.edu
Submitter's telephone number: 312 413 2176
Fax number: 312 996 1491