Scientists take another look at tantalizing galaxies with no dark matter

Analyzing the mountains of data generated by the Large Hadron Collider at the European laboratory CERN takes so much time that even the computers need coffee. Or rather, Coffea — Columnar Object Framework for Effective Analysis.

A package in the programming language Python, Coffea (pronounced like the stimulating beverage) speeds up the analysis of massive data sets in high-energy physics research. Although Coffea streamlines computation, the software’s primary goal is to optimize scientists’ time.

“The efficiency of a human being in producing scientific results is of course affected by the tools that you have available,” said Matteo Cremonesi, a postdoc at the U.S. Department of Energy’s Fermi National Accelerator Laboratory. “If it takes more than a day for me to get a single number out of a computation — which often happens in high-energy physics — that’s going to hamper my efficiency as a scientist.”

Frustrated by the tedious manual work they faced when writing computer code to analyze LHC data, Cremonesi and Fermilab scientist Lindsey Gray assembled a team of Fermilab researchers in 2018 to adapt cutting-edge big data techniques to solve the most challenging questions in high-energy physics. Since then, around a dozen research groups on the CMS experiment — one of the LHC’s two large general-purpose detectors — have adopted Coffea for their work.

Around a dozen research groups on the CMS experiment at the Large Hadron Collider have adopted the Coffea data analysis tool for their work. Starting from information about the particles generated in collisions, Coffea enables large statistical analyses that hone researchers’ understanding of the underlying physics, enabling faster run times and more efficient use of computing resources. Photo: CERN

Starting from information about the particles generated in collisions, Coffea enables large statistical analyses that hone researchers’ understanding of the underlying physics. (Data processing facilities at the LHC carry out the initial conversion of raw data into a format particle physicists can use for analysis.) A typical analysis on the current LHC data set involves processing an astounding roughly 10 billion particle events that can add up to over 50 terabytes of data. That’s the data equivalent of approximately 25,000 hours of streaming video on Netflix.

At the heart of Fermilab’s analysis tool lies a shift from a method known as event loop analysis to one called columnar analysis.

“You have a choice whether you want to iterate over each row and do an operation within the columns or if you want to iterate over the operations you’re doing and attack all the rows at once,” explained Fermilab postdoctoral researcher Nick Smith, the main developer of Coffea. “It’s sort of an order-of-operations thing.”

For example, imagine that for each row, you want to add together the numbers in three columns. In event loop analysis, you would start by adding together the three numbers in the first row. Then you would add together the three numbers in the second row, then move on to the third row, and so on. With a columnar approach, by contrast, you would start by adding the first and second columns for all the rows. Then you would add that result to the third column for all the rows.

“In both cases, the end result would be the same,” Smith said. “But there are some trade-offs you make under the hood, in the machine, that have a big impact on efficiency.”

In data sets with many rows, columnar analysis runs around 100 times faster than event loop analysis in Python. Yet prior to Coffea, particle physicists primarily used event loop analysis in their work — even for data sets with millions or billions of collisions.

The Fermilab researchers decided to pursue a columnar approach, but they faced a glaring challenge: High-energy physics data cannot easily be represented as a table with rows and columns. One particle collision might generate a slew of muons and few electrons, while the next might produce no muons and many electrons. Building on a library of Python code called Awkward Array, the team devised a way to convert the irregular, nested structure of LHC data into tables compatible with columnar analysis. Generally, each row corresponds to one collision, and each column corresponds to a property of a particle created in the collision.

Coffea’s benefits extend beyond faster run times — minutes compared to hours or days with respect to interpreted Python code — and more efficient use of computing resources. The software takes mundane coding decisions out of the hands of the scientists, allowing them to work on a more abstract level with fewer chances to make errors.

“Researchers are not here to be programmers,” Smith said. “They’re here to be data scientists.”

Cremonesi, who searches for dark matter at CMS, was among the first researchers to use Coffea with no backup system. At first, he and the rest of the Fermilab team actively sought to persuade other groups to try the tool. Now, researchers frequently approach them asking how to apply Coffea to their own work.

Soon, Coffea’s use will expand beyond CMS. Researchers at the Institute for Research and Innovation in Software for High Energy Physics, supported by the U.S. National Science Foundation, plan to incorporate Coffea into future analysis systems for both CMS and ATLAS, the LHC’s other large general-purpose experimental detector. An upgrade to the LHC known as the High-Luminosity LHC, targeted for completion in the mid-2020s, will record about 100 times as much data, making the efficient data analysis offered by Coffea even more valuable for the LHC experiments’ international collaborators.

In the future, the Fermilab team also plans to break Coffea into several Python packages, allowing researchers to use just the pieces relevant to them. For instance, some scientists use Coffea mainly for its histogram feature, Gray said.

For the Fermilab researchers, the success of Coffea reflects a necessary shift in particle physicists’ mindset.

“Historically, the way we do science focuses a lot on the hardware component of creating an experiment,” Cremonesi said. “But we have reached an era in physics research where handling the software component of our scientific process is just as important.”

Coffea promises to bring high-energy physics into sync with recent advances in big data in other scientific fields. This cross-pollination may prove to be Coffea’s most far-reaching benefit.

“I think it’s important for us as a community in high-energy physics to think about what kind of skills we’re imparting to the people that we’re training,” Gray said. “Making sure that we as a field are pertinent to the rest of the world when it comes to data science is a good thing to do.”

U.S. participation in CMS is supported by the Department of Energy Office of Science.

Fermilab is supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit science.energy.gov.

The cosmic microwave background, or CMB, is the electromagnetic echo of the Big Bang, radiation that has been traveling through space and time since the very first atoms were born 380,000 years after our universe began. Mapping minuscule variations in the CMB tells scientists about how our universe came to be and what it’s made of.

To capture the ancient, cold light from the CMB, researchers use specialized telescopes equipped with ultrasensitive cameras for detecting millimeter-wavelength signals. The next-generation cameras will contain up to 100,000 superconducting detectors. Fermilab scientist and University of Chicago Associate Professor Jeff McMahon and his team have developed a new type of metamaterials-based antireflection coating for the silicon lenses used in these cameras.

“There are at least half a dozen projects that would not be possible without these,” McMahon said.

Metamaterials are engineered materials with properties that aren’t naturally occurring. The magic is in the microstructure — tiny, repeating features smaller than the wavelength of the light they are designed to interact with. These features bend, block or otherwise manipulate light in unconventional ways.

Left: One of the lenses developed by McMahon’s team is installed in a camera assembly. Top right: This shows a close-up view of the stepped pyramid metamaterial structure responsible for the lens’ antireflective properties. Bottom right: Members of the McMahon lab stand by recently fabricated silicon lenses. Photo courtesy of Jeff McMahon

Generally, antireflection coatings work by reflecting light from each side of the coating in such a way that the reflected particles of light interfere and cancel each other, eliminating reflection. For McMahon’s metamaterials, the “coating” is a million tiny, precise cuts in each side of each silicon lens. Up close, the features look like stepped pyramids — three layers of square pillars stacked on top of each other. The pillars’ spacing and thickness is fine-tuned to create the maximum destructive interference between reflected light.

“Light just goes sailing right through with a tenth of a percent chance of reflecting,” McMahon said.

The single-crystal silicon lenses are transparent to microwaves and ultrapure so that the light passing through the lens won’t be absorbed or scattered by impurities. Silicon has the necessary light-bending properties for getting light from the telescope onto a large array of sensors, and the metamaterial structure takes care of reflection. Because each lens is made from a single pure silicon crystal, they can withstand cryogenic temperatures (the detectors have to operate at 0.1 kelvins) without the risk of cracking or peeling like lenses with antireflective coatings made from a different material.

All told, these lenses are arguably the best technology available for CMB instruments, McMahon says.

“It’s not exactly that you couldn’t do the experiment otherwise,” McMahon said, but for the performance and durability demanded by current and next-generation CMB surveys, these lenses are the state of the art — and his team are the only people in the world who make them.

McMahon and his team began developing the technology about 10 years ago when they started working on a new type of detector array and realized that they needed a better, less reflective lens to go with it. The hard part, he says, was figuring out how to make it. Techniques existed for making micrometer-accurate cuts in flat silicon wafers, but nobody had ever applied them to a lens before. The first lens they made, for the Atacama Cosmology Telescope, called ACT, took 12 weeks to fabricate because of the huge number of cuts that needed to be made. Now with improved machines and automation at Fermilab, the process takes just four days per lens, and McMahon hopes they will be able to streamline it even further.

Jeff McMahon and his team have developed new techniques for working with curved lenses instead of flat silicon wafers for CMB telescope lenses. Photo courtesy of Jeff McMahon

Working at the University of Michigan until January 2020, McMahon’s team fabricated about 20 lenses for current CMB experiments including ACTPol, Advanced ACTPol, CLASS, TolTEC and PIPER. They are now producing lenses for the Simons Observatory, which will start collecting data next year. From there, they will begin making additional lenses for CMB-S4 (Cosmic Microwave Background Stage 4), a next-generation project of which Fermilab is a member. CMB-S4 is scheduled to begin collecting data in 2027 using 21 telescopes at observatories in Chile and the South Pole for the most detailed CMB survey yet.

“The second we finish a lens, it’s doing science, and that’s what makes it fun for me,” McMahon said. “All the metamaterial stuff is cool, but at the end of the day I just want to figure out how the universe began and what’s in it.”

McMahon compares CMB-S4 to opening a treasure chest full of gold and jewels. He and the other researchers contributing to it don’t know exactly what they’ll find in the data, but they know it will be valuable. Even if they don’t find primordial gravitational waves — one of the project’s major goals — the experiment will still shed light on cosmic mysteries such as dark energy, dark matter and neutrino masses.

What his team has achieved with their lens technology, McMahon says, is a testament to the outsize effect small efforts can have on big science.

“The endeavor is to begin to understand the beginning of the universe,” he said. “And the way we’re doing it is by figuring out how to machine little features in silicon.”

This work is supported by the Department of Energy Office of Science.

Fermilab is supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit science.energy.gov.