Fermilab computing experts bolster NOvA evidence, 1 million cores consumed

How do you arrive at the physical laws of the universe when you’re given experimental data on a renegade particle that interacts so rarely with matter, it can cruise through light-years of lead? You call on the power of advanced computing.

The NOvA neutrino experiment, in collaboration with the Department of Energy’s Scientific Discovery through Advanced Computing (SciDAC-4) program and the HEPCloud program at DOE’s Fermi National Accelerator Laboratory, was able to perform the largest-scale analysis ever to support the recent evidence of antineutrino oscillation, a phenomenon that may hold clues to how our universe evolved.

Using Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), located at Lawrence Berkeley National Laboratory, NOvA used over 1 million computing cores, or CPUs, between May 14 and 15 and over a short timeframe one week later. This is the largest number of CPUs ever used concurrently over this duration — about 54 hours — for a single high-energy physics experiment. This unprecedented amount of computing enabled scientists to carry out some of the most complicated techniques used in neutrino physics, allowing them to dig deeper into the seldom seen interactions of neutrinos. This Cori allocation was more than 400 times the amount of Fermilab computing allocated to the NOvA experiment and 50 times the total computing capacity at Fermilab allocated for all of its rare-physics experiments. A continuation of the analysis was performed on NERSC’s Cori and Edison supercomputers one week later. In total, nearly 35 million core-hours were consumed by NOvA in the 54-hour period. Executing the same analysis on a single desktop computer would take 4,000 years.

The Cori supercomputer at NERSC was used to perform a complex computational analysis for NOvA. NOvA used over 1 million computing cores, the largest amount ever used concurrently in a 54-hour period. Photo: Roy Kaltschmidt, Lawrence Berkeley National Laboratory

“The special thing about NERSC is that it enabled NOvA to do the science at a new level of precision, a much finer resolution with greater statistical accuracy within a finite amount of time,” said Andrew Norman, NOvA physicist at Fermilab. “It facilitated doing analysis of real data coming off the detector at a rate 50 times faster than that achieved in the past. The first round of analysis was done within 16 hours. Experimenters were able to see what was coming out of the data, and in less than six hours everyone was looking at it. Without these types of resources, we, as a collaboration, could not have turned around results as quickly and understood what we were seeing.”

The experiment presented the latest finding from the recently collected data at the Neutrino 2018 conference in Germany on June 4.

“The speed with which NERSC allowed our analysis team to run sophisticated and intense calculations needed to produce our final results has been a game-changer,” said Fermilab scientist Peter Shanahan, NOvA co-spokesperson. “It accelerated our time-to-results on the last step in our analysis from weeks to days, and that has already had a huge impact on what we were able to show at Neutrino 2018.”

In addition to the state-of-the-art NERSC facility, NOvA relied on work done within the SciDAC HEP Data Analytics on HPC (high-performance computers) project and the Fermilab HEPCloud facility. Both efforts are led by Fermilab scientific computing staff, and both worked together with researchers at NERSC to be able to support NOvA’s antineutrino oscillation evidence.

The current standard practice for Fermilab experimenters is to perform similar analyses using less complex calculations through a combination of both traditional high-throughput computing and the distributed computing provided by Open Science Grid, a national partnership between laboratories and universities for data-intensive research. These are substantial resources, but they use a different model: Both use a large amount of computing resources over a long period of time. For example, some resources are offered only at a low priority, so their use may be preempted by higher-priority demands. But for complex, time-sensitive analyses such as NOvA’s, researchers need the faster processing enabled by modern, high-performance computing techniques.

SciDAC-4 is a DOE Office of Science program that funds collaboration between experts in mathematics, physics and computer science to solve difficult problems. The HEP on HPC project was funded specifically to explore computational analysis techniques for doing large-scale data analysis on DOE-owned supercomputers. Running the NOvA analysis at NERSC, the mission supercomputing facility for the DOE Office of Science, was a task perfectly suited for this project. Fermilab’s Jim Kowalkowski is the principal investigator for HEP on HPC, which also has collaborators from DOE’s Argonne National Laboratory, Berkeley Lab, University of Cincinnati and Colorado State University.

“This analysis forms a kind of baseline. We’re just ramping up, just starting to exploit the other capabilities of NERSC at an unprecedented scale,” Kowalkowski said.

The project’s goal for its first year is to take compute-heavy analysis jobs like NOvA’s and enable it on supercomputers. That means not just running the analysis, but also changing how calculations are done and learning how to revamp the tools that manipulate the data, all in an effort to improve techniques used for doing these analyses and to leverage the full computational power and unique capabilities of modern high-performance computing facilities. In addition, the project seeks to consume all computing cores at once to shorten that timeline.

The Fermilab HEPCloud facility provides cost-effective access to compute resources by optimizing usage across all available types and elastically expanding the resource pool on short notice by, for example, renting temporary resources on commercial clouds or using high-performance computers. HEPCloud enables NOvA and physicists from other experiments to use these compute resources in a transparent way.

For this analysis, “NOvA experimenters didn’t have to change much in terms of business as usual,” said Burt Holzman, HEPCloud principal investigator. “With HEPCloud, we simply expanded our local on-site-at-Fermilab facilities to include Cori and Edison at NERSC.”

At the Neutrino 2018 conference, Fermilab's NOvA neutrino experiment announced that it had seen strong evidence of muon antineutrinos oscillating into electron antineutrinos over long distances. NOvA collaborated with the Department of Energy’s Scientific Discovery through Advanced Computing program and Fermilab's HEPCloud program to perform the largest-scale analysis ever to support the recent evidence. Photo: Reidar Hahn

At the Neutrino 2018 conference, Fermilab’s NOvA neutrino experiment announced that it had seen strong evidence of muon antineutrinos oscillating into electron antineutrinos over long distances. NOvA collaborated with the Department of Energy’s Scientific Discovery through Advanced Computing program and Fermilab’s HEPCloud program to perform the largest-scale analysis ever to support the recent evidence. Photo: Reidar Hahn

Building on work the Fermilab HEPCloud team has been doing with researchers at NERSC to optimize high-throughput computing in general, the HEPCloud team was able to leverage the facility to achieve the million-core milestone. Thus, it holds the record for the most resources ever provisioned concurrently at a single facility to run experimental HEP workflows.

“This is the culmination of more than a decade of R&D we have done at Fermilab under SciDAC and the first taste of things to come, using these capabilities and HEPCloud,” said Panagiotis Spentzouris, head of the Fermilab Scientific Computing Division and HEPCloud sponsor.

“NOvA is an experimental facility located more than 2,000 miles away from Berkeley Lab, where NERSC is located. The fact that we can make our resources available to the experimental researchers near real-time to enable their time-sensitive science that could not be completed otherwise is very exciting,” said Wahid Bhimji, a NERSC data architect at Berkeley Lab who worked with the NOvA team. “Led by colleague Lisa Gerhardt, we’ve been working closely with the HEPCloud team over the last couple of years, also to support physics experiments at the Large Hadron Collider. The recent NOvA results are a great example of how the infrastructure and capabilities that we’ve built can benefit a wide range of high energy experiments.”

Going forward, Kowalkowski, Holzman and their associated teams will continue building on this achievement.

“We’re going to keep iterating,” Kowalkowski said. “The new facilities and procedures were enthusiastically received by the NOvA collaboration. We will accelerate other key analyses.”

NERSC is a DOE Office of Science user facility.

Fermilab is America’s premier national laboratory for particle physics and accelerator research. A U.S. Department of Energy Office of Science laboratory, Fermilab is located near Chicago, Illinois, and operated under contract by the Fermi Research Alliance LLC, a joint partnership between the University of Chicago and the Universities Research Association Inc. Visit Fermilab’s website at www.fnal.gov and follow us on Twitter at @Fermilab.

DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.