A national-lab partnership to solve the big questions about our universe

Rob Roser

Rob Roser, head of the Scientific Computing Division, wrote this column.

The universe is a vast and mysterious place. Scientists around the world are starting to use computers to simulate how the big bang generated the seeds that led to the formation of galaxies such as our own Milky Way. A new project sponsored by three of the Energy Department’s national labs will allow scientists to study this vastness in greater detail with a new cosmological simulation analysis toolbox.

Modeling the universe with a computer is very difficult. Scientists use supercomputers to simulate the evolution of our galaxies, and the output of those simulations is typically very large. By anyone’s standards, this is “big data,” as these data sets can require hundreds of terabytes of storage space each. Efficient storage and sharing of these huge data sets among scientists is paramount. Many different scientific analyses and processing sequences are carried out with each data set, making regeneration on demand infeasible.

This past year, Fermilab began a unique partnership with Argonne and Lawrence Berkeley national laboratories on an ambitious advanced-computing project. Together the three labs are developing a new, state-of-the-art cosmological simulation analysis toolbox that takes advantage of the Energy Department’s investments in supercomputers and specialized high-performance computing codes. Argonne’s team is led by Salman Habib, principal investigator, and Ravi Madduri, system designer. Jim Kowalkowski and Richard Gerber are the team leaders at Fermilab and Berkeley Lab.

The three labs have embarked on an innovative project to develop an open platform with a web-based front end that will allow the scientific community to download, transfer, manipulate, search and record simulation data. The system will allow scientists to upload and share applications as well as carry out complex computational analyses using the resources available to and assigned by the system.

To achieve these objectives, the team uses and enhances existing high-performance computing, high-energy physics and cosmology-specific software systems. As they modify the existing software so that it can handle the large data sets of galaxy-formation simulations, team members take advantage of the expertise they have gained by working on the big data challenges posed by particle physics experiments at the Large Hadron Collider.

This is an exciting project for Fermilab, Argonne Lab and Berkeley Lab to embark on. Large-scale simulations of cosmological structure formation are key discovery tools in the Energy Department’s Cosmic Frontier program, which is managed by Office of High Energy Physics. Not only will this new project provide an important toolbox for Cosmic Frontier scientists and the many institutions involved in this research, but it will also serve as a prototype for a successful big-data software project spanning many groups and communities.

The commercial world has taken notice, too. In October, I will have the opportunity to present this project as part of my keynote speech at the Big Data Conference in Chicago.