Fermilab’s stored data reaches 100 petabytes

Data Movement and Storage Department stand in the FCC tape robot room. From left: Alex Kulyavtsev, Yujun Wu, Albert Rossi, Dmitri Litvintsev, Gene Oleynik, John Hendry, Stan Naymola, Tim Messer, Terry Jones, Alexander Moibenko, Chih-Hao Huang, Paul Tader. Not pictured are Gerard Bernabeu and George Szmuksta. Photo courtesy of Hannah Ward, OCIO

Data Movement and Storage Department stand in the FCC tape robot room. From left: Alex Kulyavtsev, Yujun Wu, Albert Rossi, Dmitri Litvintsev, Gene Oleynik, John Hendry, Stan Naymola, Tim Messer, Terry Jones, Alexander Moibenko, Chih-Hao Huang, Paul Tader. Not pictured are Gerard Bernabeu and George Szmuksta. Photo courtesy of Hannah Ward, OCIO

If you have visited the Grid Computing Center and the computer rooms in the Feynman Computing Center, you may be surprised to hear that these rather small spaces hold 100 million of something. As of January, 100 million gigabytes (100 petabytes) of data is stored in the Fermilab computing centers on magnetic tapes, all maintained by the Scientific Computing Division’s Data Movement and Storage Department.

We have some idea of what a gigabyte is because we know how much data is needed to fill our phones, which, on average, range from eight to 64 gigabytes. But 100 million gigabytes, like 100 million of anything, is hard to imagine. For reference, one petabyte is equivalent to 13.3 years of HD-TV video. In 1995, only 20 petabytes of hard drive space were manufactured. Fifty petabytes is equivalent to the entire written works of humankind in all languages since the beginning of recorded history. To summarize: 100 petabytes of data is a lot!

The 100 (and counting) petabytes are collected from experiments at Fermilab and at CERN. It can be difficult to predict how much data the experiments will produce at any moment, particularly when they first start up. The DMS Department must therefore be flexible and able to quickly reorganize and move storage resources, which also requires maintaining a sufficient amount of extra resources to account for any unknowns.

But the data can’t be collected then left unattended. One of the mantras we frequently hear in the current information age is “technology is always changing.” The changing technology associated with data storage helps maintain the data and improve data transfer processes, but keeping up with the changes requires time and effort. Most data at Fermilab is stored in files on magnetic tapes. The tape drives that read the tapes are not supported indefinitely, which means the data must be moved every five to 10 years to new tapes that work with the new, supported tape drives.

In addition to these periodic upgrades, effort also goes into fixing or replacing tape drives that fail or malfunction. The software to manage these tape systems, enstore, is written at Fermilab and maintained by the DMS Department.

Along with storing and maintaining a colossal amount of data, the group set a new record for the number of files transferred in one day. The group stores approximately 30 petabytes of data on disk because disk files can be accessed faster than those stored on tape. Fourteen million files were successfully transferred between user applications and the disks in a 24-hour period. The software for this disk storage, dCache, is the open-source product of an international collaboration between the DESY laboratory, the Nordic Data Grid Facility and Fermilab.

The DMS Department is currently in the process of adding CMS to the list of experiments they support. As part of the addition of CMS, the group is merging with the Distributed Computing Services Operations Department. The newly merged departments hope to fully support CMS by April, which is just in time for CMS to begin taking data – further increasing the amount of data to be stored at Fermilab.