Managing and moving massive data for NOvA

The NOvA experiment’s new production capabilities run jobs on Open Science Grid resources through large-scale software distribution and large-files handling. Over two weeks, NOvA jobs ran on 10 OSG clusters (see different colors) at five sites and FermiCloud for a total of 90,000 CPU hours. Jobs were submitted in three “waves."

Members of the NOvA experiment, along with personnel from the Scientific Computing Division and systems administrators from participating Open Science Grid institutions, recently deployed the experiment’s large-scale C++ analysis code to run on demand on participating OSG sites. The resulting production campaign was the culmination of several months of work.

NOvA is attempting to observe the appearance of electron neutrinos as a result of neutrino oscillations within the NuMI beam. The NOvA far detector is usually run with two main independent output-event streams, a cosmic trigger and a beam trigger, which contains the oscillated neutrino signal. The data is transferred back to Fermilab as soon as it becomes available and is cataloged and archived for permanent storage. To extract the oscillated neutrino signal from this data, it is critical to understand the cosmic-ray background in great detail. This is done through computing simulations, the focus of the computational activity on OSG.

The deployment team faced two major obstacles that they needed to overcome: deploying a consistent version of rapidly changing software to many different OSG sites and efficiently transferring large amounts of data to these sites. To overcome the first challenge, they used the CERN Virtual Machine File System. CVMFS stores an experiment’s entire software suite, including all of the external dependencies, on a set of distribution servers. As individual worker nodes require access to software libraries, they download the needed libraries and store them in a local cache. The system downloads only the software that is needed for an individual job rather than the entire suite.

The second challenge, efficiently transferring data to sites, was tackled by using Fermilab’s file cataloging front end, using software called dCache, to its ENSTORE mass storage system. From dCache, the sequential access via metadata (SAM) data management system retrieves the input file and then transfers the files in series to the worker nodes, which process the input file. Once the processing is completed, output files are automatically transferred back to Fermilab for cataloging and archiving. The NOvA collaboration targeted the generation of 1,000,000 events — about three times as many events as have ever been produced in the past. This was achieved by running 10,000 jobs, requiring almost 90,000 CPU hours and producing 2 terabytes of data during two weeks of operations.

NOvA’s Andrew Norman and Fermilab’s Gabriele Garzoglio spearheaded this effort, along with collaborators from Southern Methodist University, University of Nebraska-Lincoln, University of Chicago, University of California, San Diego, University of Wisconsin – Madison and FermiCloud. The lessons learned and success of these accomplishments serve as good precursors for use of the OSG by other similar experiments, such as LBNE.

Nathan Mayer, Tufts University, and Gavin Davies, Iowa State University