DUNE prepares for data onslaught

The international Deep Underground Neutrino Experiment, hosted by Fermilab, will be one of the most ambitious attempts ever made at understanding some of the most fundamental questions about our universe. Currently under construction at the Sanford Underground Research Facility in South Dakota, DUNE will provide a massive target for neutrinos. When it’s operational, DUNE will comprise around 70,000 tons of liquid argon — more than enough to fill a dozen Olympic-sized swimming pools — contained in cryogenic tanks nearly a mile underground.

Neutrinos are ubiquitous. They were formed in the first seconds after the Big Bang, even before atoms could form, and they are constantly being produced by nuclear reactions in stars. When massive stars explode and become supernovae, the vast majority of the energy given off in the blast is released as a burst of neutrinos.

In the laboratory, scientists use particle accelerators to make neutrinos. In DUNE’s case, Fermilab accelerators will generate the world’s most powerful high-energy neutrino beam, aiming it at the DUNE neutrino detector 800 miles (1,300 kilometers) away in South Dakota.

When any of these neutrinos — star-born or terrestrial — strikes one of the argon atoms in the DUNE detector, a cascade of particles results. Every time this happens, billions of detector digits are generated, which must be saved and analyzed further by collaborators over the world. The resulting data that will be churned out by the detector will be immense. So, while construction continues in South Dakota, scientists around the world are hard at work developing the computing infrastructure necessary to handle the massive volumes of data the experiment will produce.

The goal of the DUNE Computing Consortium is to establish a global computing network that can handle the massive data dumps DUNE will produce by distributing them across the grid. Photo: Reidar Hahn, Fermilab

The first step is ensuring that DUNE is connected to Fermilab with the kind of bandwidth that can carry tens of gigabits of data per second, said Liz Sexton-Kennedy, Fermilab’s chief information officer. As with other aspects of the collaboration, it requires “a well-integrated partnership,” she said. Each neutrino collision in the detector will produce an array of information to be analyzed.

“When there’s a quantum interaction at the center of the detector, that event is physically separate from the next one that happens,” Sexton-Kennedy said. “And those two events can be processed in parallel. So, there has to be something that creates more independence in the computing workflow that can split up the work.”

Sharing the load

One way to approach this challenge is by distributing the workflow around the world. Mike Kirby of Fermilab and Andrew McNab of the University of Manchester in the UK are the technical leads of the DUNE Computing Consortium, a collective effort by members of the DUNE collaboration and computing experts at partner institutions. Their goal is to establish a global computing network that can handle the massive data dumps DUNE will produce by distributing them across the grid.

“We’re trying to work out a roadmap for DUNE computing in the next 20 years that can do two things,” Kirby said. “One is an event data model,” which means figuring out how to handle the data the detector produces when a neutrino collision occurs, “and the second is coming up with a computing model that can use the conglomerations of computing resources around the world that are being contributed by different institutions, universities and national labs.”

It’s no small task. The consortium includes dozens of institutions, and the challenge is ensuring the computers and servers at each are orchestrated together so that everyone on the project can carry out their analyses of the data. A basic challenge, for example, is making sure a computer in Switzerland or Brazil recognizes a login from a computer at Fermilab.

Coordinating computing resources across a distributed grid has been done before, most notably by the Worldwide LHC Computing Grid, which federates the United States’ Open Science Grid and others around the world. But this is the first time an experiment at this scale led by Fermilab has used this distributed approach.

“Much of the Worldwide LHC Computing Grid design assumes data originates at CERN and that meetings will default to CERN, but as DUNE now has an associate membership of WLCG things are evolving,” said Andrew McNab, DUNE’s international technical lead for computing. “One of the first steps was hosting the monthly WLCG Grid Deployment Board town hall at Fermilab last September, and DUNE computing people are increasingly participating in WLCG’s task forces and working groups.”

“We’re trying to build on a lot of the infrastructure and software that’s already been developed in conjunction with those two efforts and extend it a little bit for our specific needs,” Kirby said. “It’s a great challenge to coordinate all of the computing around the world. In some sense, we’re kind of blazing a new trail, but in many ways, we are very much reliant on a lot of the tools that were already developed.”

Coordinating computing resources across a distributed grid has been done before — but this is the first time an experiment at this scale led by Fermilab has used this approach.

Supernovae signals

Another challenge is that DUNE has to organize the data it collects differently from particle accelerator physics experiments.

“For us, a typical neutrino event from the accelerator beam is going to generate something on the order of six gigabytes of data,” Kirby said. “But if we get a supernova neutrino alert,” in which a neutrino burst from a supernova arrives, signaling the cosmic explosion before light from it arrives at Earth, “a single supernova burst record could be as much as 100 terabytes of data.”

One terabyte equals one trillion bytes, an amount of data equal to about 330 hours of Netflix movies. Created in a few seconds, that amount of data is a huge challenge because of the computer processing time needed to handle it. DUNE researchers must begin recording data soon after a neutrino alert is triggered, and it adds up quickly. But it will also offer an opportunity to learn about neutrino interactions that take place inside supernovae while they’re exploding.

McNab said DUNE’s computing requirements are also slightly different because the size of each of the events it will capture is typically 100 times larger than the LHC experiments like ATLAS or CMS.

“So, the computers need more memory — not 100 times more, because we can be clever about how we use it, but we’re pushing the envelope certainly,” McNab said. “And that’s before we even start talking about the huge events if we see a supernova.”

When a neutrino strikes one of the argon atoms in the DUNE detector, a cascade of particles results. Every time this happens, billions of detector digits are generated, which must be saved and analyzed further by collaborators over the world. via GIPHY

Georgia Karagiorgi, a physicist at Columbia University who leads data selection efforts for the DUNE Data Acquisition Consortium, said a nearby supernova will generate up to thousands of interactions in the DUNE detector.

“That will allow us to answer questions we have about supernova dynamics and about the properties of neutrinos themselves,” she said.

To do so, DUNE scientists will have to combine data on the timing of neutrino arrival, their abundance and what kinds of neutrinos are present.

“If neutrinos have weird, new types of interactions as they’re propagating through the supernova during the explosion, we might expect modifications to the energy distribution of those neutrinos as a function of time” as they are picked up by the detector, Karagiorgi said. “That goes hand-in-hand with very detailed, and also quite computationally intensive, simulations, with different theoretical assumptions going into them, to actually be able to extract our science. We need both the theoretical simulations and the actual data to make progress.”

Gathering that data is a huge endeavor. When a supernova event occurs, “we read out our far-detector modules for about 100 seconds continuously,” Kirby said.

Because the scientists don’t know when a supernova will happen, they have to start collecting data as soon as an alert occurs and could be waiting for 30 seconds or longer for the neutrino burst to conclude. All the while, data could be piling up.

To prevent too much buildup, Kirby said, the experiment will use an approach called a circular buffer, in which memory that doesn’t include neutrino hits is reused, not unlike rewinding and recording over the tape in a video cassette.

McNab said the supernovae aspect of DUNE is also presenting new opportunities for computing collaboration.

“I’m a particle physicist by training, and one of my favorite aspects about working on this project is that way that it connects to other scientific disciplines, particularly astronomy,” he said. In the UK, particle physics and astronomy computing are collectively providing support for DUNE, the Vera C. Rubin Observatory Legacy Survey of Space and Time, and the Square Kilometer Array radio telescopes on the same computers. “And then we have the science aspect that, if we do see a supernova, then we will hopefully be viewing it with multiple wavelengths using these different instruments. DUNE provides an excellent pathfinder for the computing, because we already have real data coming from DUNE’s prototype detectors that needs to be processed.”

Kirby said that the computing effort is leading to exciting new developments in applications on novel architectures, artificial intelligence and machine learning on diverse computer platforms.

“In the past, we’ve focused on doing all of our data processing and analysis on CPUs and standard Intel and PC processors,” he said. “But with the rise of GPUs [graphics processing units] and other computing hardware accelerators such as FPGAs [field-programmable gate arrays] and ASICs [application-specific integrated circuits], software has been written specifically for those accelerators. That really has changed what’s possible in terms of event identification algorithms.”

These technologies are already in use for the on-site data acquisition system in reducing the terabytes per second generated by the detectors down to the gigabytes per second transferred offline. The challenge that remains for offline is figuring out how to centrally manage these applications across the entire collaboration and get answers back from distributed centers across the grid.

“How do we stitch all of that together to make a cohesive computing model that gets us to physics as fast as possible?” Kirby said. “That’s a really incredible challenge.”

This work is supported by the Department of Energy Office of Science.

Fermilab is supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.