To meet the evolving needs of high-energy physics experiments, the underlying computing infrastructure must also evolve. Say hi to HEPCloud, the new, flexible way of meeting the peak computing demands of high-energy physics experiments using supercomputers, commercial services and other resources.
Five years ago, Fermilab scientific computing experts began addressing the computing resource requirements for research occurring today and in the next decade. Back then, in 2014, some of Fermilab’s neutrino programs were just starting up. Looking further into future, plans were under way for two big projects. One was Fermilab’s participation in the future High-Luminosity Large Hadron Collider at the European laboratory CERN. The other was the expansion of the Fermilab-hosted neutrino program, including the international Deep Underground Neutrino Experiment. All of these programs would be accompanied by unprecedented data demands.
To meet these demands, the experts had to change the way they did business.
HEPCloud, the flagship project pioneered by Fermilab, changes the computing landscape because it employs an elastic computing model. Tested successfully over the last couple of years, it officially went into production as a service for Fermilab researchers this spring.

Scientists on Fermilab’s NOvA experiment were able to execute around 2 million hardware threads at a supercomputer the Office of Science’s National Energy Research Scientific Computing Center. And scientists on CMS experiment have been running workflows using HEPCloud at NERSC as a pilot project. Photo: Roy Kaltschmidt, Lawrence Berkeley National Laboratory
Experiments currently have some fixed computing capacity that meets, but doesn’t overshoot, its everyday needs. For times of peak demand, HEPCloud enables elasticity, allowing experiments to rent computing resources from other sources, such as supercomputers and commercial clouds, and manages them to satisfy peak demand. The prior method was to purchase local resources that on a day-to-day basis, overshoot the needs. In this new way, HEPCloud reduces the costs of providing computing capacity.
“Traditionally, we would buy enough computers for peak capacity and put them in our local data center to cover our needs,” said Fermilab scientist Panagiotis Spentzouris, former HEPCloud project sponsor and a driving force behind HEPCloud. “However, the needs of experiments are not steady. They have peaks and valleys, so you want an elastic facility.”
In addition, HEPCloud optimizes resource usage across all types, whether these resources are on site at Fermilab, on a grid such as Open Science Grid, in a cloud such as Amazon or Google, or at supercomputing centers like those run by the DOE Office of Science Advanced Scientific Computing Research program (ASCR). And it provides a uniform interface for scientists to easily access these resources without needing expert knowledge about where and how best to run their jobs.
The idea to create a virtual facility to extend Fermilab’s computing resources began in 2014, when Spentzouris and Fermilab scientist Lothar Bauerdick began exploring ways to best provide resources for experiments at CERN’s Large Hadron Collider. The idea was to provide those resources based on the overall experiment needs rather than a certain amount of horsepower. After many planning sessions with computing experts from the CMS experiment at the LHC and beyond, and after a long period of hammering out the idea, a scientific facility called “One Facility” was born. DOE Associate Director of Science for High Energy Physics Jim Siegrist coined the name “HEPCloud” — a computing cloud for high-energy physics — during a general discussion about a solution for LHC computing demands. But interest beyond high-energy physics was also significant. DOE Associate Director of Science for Advanced Scientific Computing Research Barbara Helland was interested in HEPCloud for its relevancy to other Office of Science computing needs.

The CMS detector at CERN collects data from particle collisions at the Large Hadron Collider. Now that HEPCloud is in production, CMS scientists will be able to run all of their physics workflows on the expanded resources made available through HEPCloud. Photo: CERN
The project was a collaborative one. In addition to many individuals at Fermilab, Miron Livny at the University of Wisconsin-Madison contributed to the design, enabling HEPCloud to use the workload management system known as Condor (now HTCondor), which is used for all of the lab’s current grid activities.
Since its inception, HEPCloud has achieved several milestones as it moved through the several development phases leading up to production. The project team first demonstrated the use of cloud computing on a significant scale in February 2016, when the CMS experiment used HEPCloud to achieve about 60,000 cores on the Amazon cloud, AWS. In November 2016, CMS again used HEPCloud to run 160,000 cores using Google Cloud Services, doubling the total size of CMS’s computing worldwide. Most recently in May 2018, NOvA scientists were able to execute around 2 million hardware threads at a supercomputer the Office of Science’s National Energy Research Scientific Computing Center (NERSC), increasing both the scale and the amount of resources provided. During these activities, the experiments were executing and benefiting from real physics workflows. NOvA was even able to report significant scientific results at the Neutrino 2018 conference in Germany, one of the most attended conferences in neutrino physics.
CMS has been running workflows using HEPCloud at NERSC as a pilot project. Now that HEPCloud is in production, CMS scientists will be able to run all of their physics workflows on the expanded resources made available through HEPCloud.
Next, HEPCloud project members will work to expand the reach of HEPCloud even further, enabling experiments to use the leadership-class supercomputing facilities run by ASCR at Argonne National Laboratory and Oak Ridge National Laboratory.
Fermilab experts are working to see that, eventually, all Fermilab experiments be configured to use these extended computing resources.
This work is supported by the DOE Office of Science.
Editor’s note: This article has been corrected. CMS’s November 2016 use of HEPCloud doubled the size of the CMS experiment’s computing worldwide, not the size of the LHC’s computing worldwide.

Karen Kosky and her team maintain the Fermilab site and keep lab’s conventional facilities and property operations running smoothly. And there is a reason for she keeps a large pipe on her desk. Photo: Reidar Hahn
How long have you been at Fermilab?
About two and a half years. I came in as the deputy head of the Facilities Engineering Services Section and was promoted to head about a year ago.
What is your role at Fermilab?
That’s a tough one, you know.
The Facilities Engineering Services Section is a group that maintains all conventional facilities and utilities across our site. We’re a team that designs, constructs, operates and maintains buildings and utilities. Our team also manages all of the open space on site. We even manage the bison.
And then we have a team dedicated to managing personal property, which includes shipping, receiving, tracking and disposing of things you can move, like laptops, chairs, oscilloscopes and vehicles.
It’s a diverse team. And my role is really just overseeing all of those operations and making sure that we are paying attention to meeting laboratory goals and contract requirements.
What is a day in your life like?
A neat part of the job is its variety. We could be meeting on any given day to discuss employee development strategies, bison herd issues or long-term strategic financial planning for site infrastructure.
It’s a pretty diverse set of day-to-day experiences.
What are you working on now?
This is a unique time to be working in conventional facilities at Fermilab. Our facilities have reached an age where they need a lot of attention. We are in the middle of putting together some proposals for significant infrastructure funding to bring more resources to the site so we can refresh it and renew some of the infrastructure that keeps this site running.
What is your favorite thing about working at Fermilab?
It’s got to be the people, right? I mean, what a fascinating place to work, with brilliant people from all over the world pursuing these fundamental questions about life and the universe.
You know, I work in a kind of ordinary field of conventional facilities and property management. But when you do that in the environment of this fascinating science laboratory, it’s just a neat place to come to work every day.
What is something that might surprise other Fermilab employees?
The bison are a fairly low-maintenance activity on site, actually. But every year they require annual inoculations.
This is something that few people on site ever get a chance to see. But the grounds team goes through a process of corralling the bison into a series of chutes that eventually bring one bison at a time through a holding pen where they administer shots to the bison.
It’s a really fascinating process to watch. Because how often at a national science lab do you get to see folks corralling giant mammals through wooden chutes to be administered vaccinations?
Why do you have a large, rusty pipe on the counter of your office?
The former head of the Facilities Engineering Services Section, Kent Collins, had a veritable museum of misfit infrastructure pieces. When he retired, he graciously offered to pass on the museum, and I declined most of the pieces, except this one. I thought it was fitting to keep at least one symbol.
It’s an elbow joint for a ductile-iron drainage pipe. And it’s a good example of infrastructure that was installed in the most frugal way possible. It should have had something on it called cathodic protection to protect the pipe from the damaging effects of conductivity in the soil, and it didn’t. So it was vulnerable to degradation much quicker than we should have seen.
A little bit, it just reminds me of Kent Collins, who’s a fun guy to know. But a little bit also, it reminds me that we want to do things in the future in the right way and not cut corners so that we’re investing our limited dollars in the best way possible.
It’s 2019. We want our cell phones fast, our computers faster and screens so crisp they rival a morning in the mountains. We’re a digital society, and blurry photos from potato-cameras won’t cut it for the masses. Physicists, it turns out, aren’t any different — and they want that same sharp snap from their neutrino detectors.
Cue ArgonCube: a prototype detector under development that’s taking a still-burgeoning technology to new heights with a plan to capture particle tracks worthy of that 4K TV. The secret at its heart? It’s all about the pixels.
But let’s take two steps back. Argon is an element that makes up about 1 percent of that sweet air you’re breathing. Over the past several decades, the liquid form of argon has grown into the medium of choice for neutrino detectors. Neutrinos are those pesky fundamental particles that rarely interact with anything but could be the key to understanding why there’s so much matter in the universe.
Big detectors full of cold, dense argon provide lots of atomic nuclei for neutrinos to bump into and interact with — especially when accelerator operators are sending beams containing trillions of the little things. When the neutrinos interact, they create showers of other particles and lights that the electronics in the detector capture and transform into images.
Each image is a snapshot that captures an interaction by one of the most mysterious, flighty, elusive particles out there; a particle that caused Wolfgang Pauli, upon proposing it in 1930, to lament that he thought experimenters would never be able to detect it.

Scientists are testing the ArgonCube technology in a prototype constructed at the University of Bern in Switzerland. Photo: James Sinclair
Current state-of-the-art liquid-argon neutrino detectors — big players like MicroBooNE, ICARUS and ProtoDUNE — use wires to capture the electrons knocked loose by neutrino interactions. Vast planes of thousands of wires crisscross the detectors, each set collecting coordinates that are combined by algorithms into 3-D reconstructions of a neutrino’s interaction.
These setups are effective, well-understood and a great choice for big projects — and you don’t get much bigger than the international Deep Underground Neutrino Experiment hosted by Fermilab.
DUNE will examine how the three known types of neutrinos change as they travel long distances, further exploring a phenomenon called neutrino oscillations. Scientists will send trillions of neutrinos from Fermilab every second on a 1,300-kilometer journey through the earth — no tunnel needed — to South Dakota. DUNE will use wire chambers in some of the four enormous far detector modules, each one holding more than 17,000 tons of liquid argon.
But scientists also need to measure the beam of neutrinos as it leaves Fermilab, where the DUNE near detector will be close to the neutrino source and see more interactions.
“We expect the beam to be so intense that you will have a dozen neutrino interactions per beam pulse, and these will all overlap within your detector,” said Dan Dwyer, a scientist at Lawrence Berkeley National Laboratory who works on ArgonCube. Trying to disentangle a huge number of events using the 2-D wire imaging is a challenge. “The near detector will be a new range of complexity.”
And new complexity, in this case, means developing a new kind of liquid-argon detector.
Pixel me this
People had thought about making a pixelated detector before, but it never got off the ground.
“This was a dream,” said Antonio Ereditato, father of the ArgonCube collaboration and a scientist at the University of Bern in Switzerland. “We developed this original idea in Bern, and it was clear that it could fly only with the proper electronics. Without it, this would have been just wishful thinking. Our colleagues from Berkeley had just what was required.”
Pixels are small, and neutrino detectors aren’t. You can fit roughly 100,000 pixels per square meter. Each one is a unique channel that — once it is outfitted with electronics — can provide information about what’s happening in the detector. To be sensitive enough, the tiny electronics need to sit right next to the pixels inside the liquid argon. But that poses a challenge.
“If they used even the power from your standard electronics, your detector would just boil,” Dwyer said. And a liquid-argon detector only works when the argon remains … well, liquid.
So Dwyer and ASIC engineer Carl Grace at Berkeley Lab proposed a new approach: What if they left each pixel dormant?
“When the signal arrives at the pixel, it wakes up and says, ‘Hey, there’s a signal here,’” Dwyer explained. “Then it records the signal, sends it out and goes back to sleep. We were able to drastically reduce the amount of power.”
At less than 100 microwatts per pixel, this solution seemed like a promising design that wouldn’t turn the detector into a tower of gas. They pulled together a custom prototype circuit and started testing. The new electronics design worked.
The first test was a mere 128 pixels, but things scaled quickly. The team started working on the pixel challenge in December 2016. By January 2018 they had traveled with their chips to Switzerland, installed them in the liquid-argon test detector built by the Bern scientists and collected their first 3-D images of cosmic rays.
“It was shock and joy,” Dwyer said.
For the upcoming installation at Fermilab, collaborators will need even more electronics. The next step is to work with manufacturers in industry to commercially fabricate the chips and readout boards that will sustain around half a million pixels. And Dwyer has received a Department of Energy Early Career Award to continue his research on the pixel electronics, complementing the Swiss SNSF grant for the Bern group.
“We’re trying to do this on a very aggressive schedule — it’s another mad dash,” Dwyer said. “We’ve put together a really great team on ArgonCube and done a great job of showing we can make this technology work for the DUNE near detector. And that’s important for the physics, at the end of the day.”

Samuel Kohn, Gael Flores, and Dan Dwyer work on ArgonCube technology at Lawrence Berkeley National Laboratory. Photo: Marilyn Chung, Lawrence Berkeley National Laboratory
More innovations ahead
While the pixel-centered electronics of ArgonCube stand out, they aren’t the only technological innovations that scientists are planning to implement for the upcoming near detector of DUNE. There’s research and development on a new kind of light detection system and new technology to shape the electric field that draws the signal to the electronics. And, of course, there are the modules.
Most liquid-argon detectors use a large container filled with the argon and not too much else. The signals drift long distances through the fluid to the long wires strung across one side of the detector. But ArgonCube is going for something much more modular, breaking the detector up into smaller units still contained within the surrounding cryostat. This has certain perks: The signal doesn’t have to travel as far, the argon doesn’t have to be as pure for the signal to reach its destination, and scientists could potentially retrieve and repair individual modules if required.
“It’s a little more complicated than the typical, wire-based detector,” said Min Jeong Kim, who leads the team at Fermilab working on the cryogenics and will be involved with the mechanical integration of the ArgonCube prototype test stand. “We have to figure out how these modules will interface with the cryogenic system.”
That means figuring out everything from filling the detector with liquid argon and maintaining the right pressure during operation to properly filtering impurities from the argon and circulating the fluid around (and through) the modules to maintain an even temperature distribution.

Researchers assemble components in the test detector at the University of Bern. Photo: James Sinclair
The ArgonCube prototype under assembly at the University of Bern will run until the end of the year before being shipped to Fermilab and installed 100 meters underground, making it the first large prototype for DUNE sent to Fermilab and tested with neutrinos. After working out its kinks, researchers can finalize the design and build the full ArgonCube detector.
Additional instrumentation and components such as a gas-argon chamber and a beam spectrometer will round out the near detector.
It’s an exciting time for the 100-some physicists from 23 institutions working on ArgonCube — and for the more than 1,000 neutrino physicists from over 30 countries working on DUNE. What started as wishful thinking has become a reality — and no one knows how far the pixel technology might go.
Ereditato even dreams of replacing the design of one of the four massive DUNE far detector modules with a pixelated version. But one thing at a time, he says.
“Right now we’re concentrating on building the best possible near detector for DUNE,” Ereditato said. “It’s been a long path, with many people involved, but the liquid-argon technology is still young. ArgonCube technology is the proof that the technique has the potential to perform even better in the future.”
The pre-excavation work for the South Dakota portion of the Long-Baseline Neutrino Facility reached another milestone. In June, construction workers finished securing the portal of the old tramway tunnel. The tunnel will house the conveyor system that will move about 800,000 tons of rock — excavated a mile underground to create the caverns for the Fermilab-hosted Deep Underground Neutrino Experiment — to its final resting place in the Open Cut, a former open pit mining area. The photo gallery below highlights various stages of this work.
The Homestake mining company had stopped using the tramway tunnel when it ceased mining operations in Lead, South Dakota, in 2002. Today the tunnel is part of the Sanford Underground Research Facility. The LBNF team is now in the process of rehabilitating the tunnel to get it ready for the installation of a conveyor system that will run from the Ross Shaft, exit through the rebuilt portal and extend to the Open Cut (see graphic). When the work is complete, the tunnel will house about 2,300 feet of the 4,250-foot-long conveyor system.
Click on the magnifying glass in the photo gallery below to view photos in full.










