N.Smirnov and E.Tcherniaev
Institute for High Energy Physics
142284 Protvino, Moscow region, Russia
DELPHI collaboration
AbstractAn application of general data compression methods to the data involved in the physics analysis at the DELPHI experiment is considered. The main goal of this is to save disk space without essential changes in the data processing chain.
The DELPHI data processing chain consists of the following experimental and simulated data types: RAW data, Full DST, Long or Leptonic DST, Short DST, and mini DST. It is clear that the most essential data for physics analysis (LDST,SDST and mDST) should be located on disks. At the present time this requires 250 Gbytes of disk space. The 1995 data will require approximately the same space. Such an amount of information produces definite difficulties even for large computer centres like the DELPHI off-line analysis centre at CERN, and for home labs it can be a real problem to keep all the information on disks.
One of the resonable ways to solve this problem is an application of generaldata compression methods. Such an approach has been implemented in the scope of the PHDST I/O package, which is being developed for the DELPHI experement to provide a user-friendly access to the data with computer-independent specification of external media.
The PHDST package uses the ZEBRA memory management system to manipulate internal data structures and for computer-independent input/output. The implementation of data compression in PHDST is essentially based on a possibility of the ZEBRA package to read/write data not only from external media (disk, tapes) but also from the internal memory of the program. Such a possibility allows to introduce the data compression in very natural way without visible changes in the user interface. For the user it is enough just to relink his program with new library to be able to work with compressed data.
We considered several data compression methods as candidates to be used in PHDST, but the final choice was more-or-less evident: it is the deflate/inflate method available in GZIP and some other programs. Based on the GZIP's sources two routines were implemented for in-memory compression/decompression, which are suitable for use inside a FORTRAN program.
In addition to the technical details of the realisation of data compression in the PHDST package, the article contains several tables with I/O timing and compression ratios for different kinds of data. The compression ratio varies between 30% and 50% for different files, and is equval to 45% in average. Some possibilities for further improvement of data compression are also discussed.
Submitter's Name: Nikolai Smirnov Submitter's Institution: CERN Address of Institution: CH-1211 Geneve 23, Switzerland Submitter's EMAIL address: nsmirnov@vxcern.cern.ch Submitter's telephone number: +41 22 767-42-55 Fax number (if have): +41 22 782-30-84 Title: Data compression in the DELPHI experiment Nikolai Smirnov, IHEP, Protvino, Russia (DELPHI collaboration) Evgueni Tcherniaev, IHEP, Protvino, Russia (DELPHI collaboration)