Gene Oleynik
Fermilab
AbstractDART is the high speed, Unix based data acquisition system being developed by the Fermilab Computing Division in collaboration with eight High Energy Physics Experiments.
This paper describes DART run-control which implements flexible, distributed, extensible and portable paradigms for the control and monitoring of data acquisition systems. We discuss the unique and interesting aspects of the run-control - why we chose the concepts we did, the benefits we have seen from the choices we made, as well as our experiences in deploying and supporting it for experiments during their commissioning and sub-system testing phases. We emphasize the software and techniques we believe are extensible to future use, and potential future modifications and extensions for those we feel are not.
For run-control command communication, we developed a general group multicasting server using ideas from the ISIS Distributed Toolkit, which was designed for developing general fault tolerant distributed applications. We felt that the group multicasting paradigm mapped onto control applications much better than conventional rpc-like techniques. This paradigm allows parallel execution of commands where appropriate, while at the same time allowing for sequential execution by sequencing commands to a series of groups where needed. It also provides nameserver-like network "transparency", so programs deal with functionally mnemonic group names rather than IP addresses and process IDs.
We have based our operator control program (OCP), which controls the DA, on the public domain TCL interpreter for command line processing, and the companion TK software for graphical control. We chose TCL because of its extensible interpretive procedures. For graphics, we chose TK because it is well integrated with TCL, is extensible, and our experience has been that interfaces can be built more quickly from TK and wish than form X and its toolkit or motif. We have seen a rise in acceptance of the use of such freeware in the HEP community because of its accessibility and relative ease-of-use. Our strategy is to use TK for the graphics and TCL when an interpreter is required wherever possible across the DA software.
We have developed a highly configurable and extensible interface for the ocp: providing procedures to bind DA commands to TK buttons and place them in a window; providing procedures to display information service parameters for modification; providing tailorable startup scripts that have all of these procedure and TK binding definitions, plus standard command aliases and option flags. All operator commands are implemented as TCL procedures which multicast commands and/or fetch parameters. Since these commands are procedure based, they can be easily modified to add in a multicast to a new group, or a new command can easily be added. These features, in combination with template startup scripts, make the operator control very flexible but suitable for most experiments as is.
We report on our design and implementation of a distributed information services system which we leverage to provide the functionality in three areas: providing parameters to the various DA applications, recording a run history, and providing a repository for DA rates and statistics used by monitoring display programs. Information is stored by keynames, and the fast keyed database can be located on disk or in-memory. The latter is used for storage of the transitory monitoring statistics. These services are widely applicable to distributed applications that have straight-forward storage and retrieval needs, i.e. don't have more complicated relationships other than loose association.
We base DA monitoring graphics on TCL/TK, the information services described above, and the "blt" public domain TK widget set. From blt we build strip-charts, bar-charts, and labels for displaying DA rate and instantaneous values.
Additionally we report on our development and use of an rlogin session multiplexor which is used to start up applications on the DA network from a single program, while capturing any terminal output they produce into logfiles or Xterm displays. This program can be driven from a single script containing the information required to start up all DA applications. It has such features as sending shell commands to wildcarded sessions so they are simultaneously sent to matching sessions. We provide a "fresh_start" script which, in combination with this multiplexor and a standard organization of per node startup files and node list, is used to automatically start up all applications in the DA system.
All components use our murmur error generation, reporting, and displaying system. We support the Unix and VxWorks operating systems.
All of our servers are written in C . We use the tools.h class library extensively. It has hash-dictionary classes we use for name look-ups, file management classes we use for managing "keyed databases" on disk, and, built into these classes, regular expression matching we use for dbs session wildcarding and wildcarded parameter fetching. Use of the tools.h class library significantly reduced the production time of our server software.
1) Submitter's Name: Gene A. Oleynik 2) Submitter's Institution: Fermilab 3) Submitter's EMAIL address: oleynik@fnal.gov 4) Submitter's phone number: (708)-840-2430 5) Intended Speaker's Name: Gene Oleynik