Preserving the data and legacy of the Tevatron

The recently completed Tevatron Run II Data Preservation Project makes the reams of CDF and DZero data available for future analysis.

Since the shutdown of the Tevatron in 2011, there has been a concerted effort to preserve the data and rich physics legacy from the CDF and DZero experiments. The Run II Data Preservation project, completed in December, enables scientists to perform publishable scientific analysis of Run II Tevatron data through at least 2020. Kenneth Herner and Bo Jayatilaka, co-leaders of the project for DZero and CDF respectively, point out that the Run II Data Preservation project enables scientists to revisit a measurement or to test new theoretical calculations long after the original experiments have ended.

“These data sets can potentially verify discoveries made at the Large Hadron Collider,” Jayatilaka said.

“The Tevatron’s unique proton-antiproton collision data set enables physics studies that are complementary to those at the LHC,” Herner added.

In the world of digital science, “data preservation” means not only preservation of the data set itself, but also of the software to enable future access to that data. The Run II Data Preservation project also addressed documentation and adoption of the sustainable infrastructure needed to ensure that scientists will be able to analyze Run II data in future computing environments.

The need for sustainable data preservation will continue to increase as science advances, experiments become less replicable and data sets become increasingly specialized. Projects such as the Data and Software Preservation for Open Science and the Study Group for Data Preservation in high-energy physics are also working to expand and improve data preservation technology.

Through the Run II Data Preservation project, both CDF and DZero have adapted their data analysis techniques with the long-term computing infrastructure supporting the Fermilab physics program going forward. Herner and Willis Sakumoto, co-leader of the effort at CDF, both emphasize that their users are now able to run their analyses in the long-term supported infrastructure without having to learn new tools.

“The project has accomplished its goal of transitioning CDF analysis infrastructure support so that we can access the data and run the software into 2020 with minimal additional cost to the base program,” Sakumoto said.

DZero users, too, are able to run their analysis using their familiar tools, Herner said.

This two-year-long project was a collaborative effort of experts from CDF and DZero, as well as the Data Management and Applications Group, the Storage Services Group, and the Scientific Software Infrastructure Department of the Scientific Computing Division, to preserve the long-term value of the Tevatron Run II experiments.

The Run II Data Preservation Project Team: Joe Boyd, Project Technical Lead; Ken Herner, DZero; Bo Jayatilaka, CDF; Rob Kennedy, Project Manager; Willis Sakumoto, CDF