VHS may have vanished, and cassettes are no longer cool, but tape is still on top when it comes to particle physics data: Fermilab stores over 100 petabytes of data, equivalent to 1,300 years of HD TV, on tape cartridges.
But why is tape, which is generally considered obsolete for music or movies, the go-to for storing all this data?
“Tape is the safest medium you can have,” said Stu Fuess, a senior scientist working in computing at Fermilab. “With tape, a machine can’t crash and cause you to lose your data.”
Fermilab’s seven tape libraries — three at the laboratory’s Feynman Computing Center and four at its Grid Computing Center — have the capacity to hold 10,000 tapes each, adding up to 600 petabytes of data storage. That leaves a lot of storage capacity that isn’t in use — yet.
“One challenge of data storage is that we don’t ever really want to throw data away — sometimes experimenters will reanalyze data years later to find something they never thought to look for,” Fuess said.
And now, increasingly large amounts of data are accumulated from more complex particle detectors, causing a growth in data Fuess called “almost exponential.”
Fermilab’s expedition into the intensity frontier — the realm of physics that requires highly intense particle beams to search for new physics — requires detailed detectors and a lot of data, enough to say whether an observation is a fundamental particle or a fluke.
“Particle physics is a statistical science, especially the intensity frontier,” Fuess said. “The more data you can accumulate, the more statistical power you can add to your measurements.”
In addition to 40 petabytes of data from intensity frontier experiments such as Fermilab’s neutrino experiments MicroBooNE and NOvA, the lab stores 40 petabytes of data from the CMS detector at the Large Hadron Collider in Switzerland. The legacy Tevatron experiments, CDF and DZero, each contribute another 10 petabytes. That brings the total to 100 PB of active data storage on tapes. A few extra petabytes of data from the Dark Energy Survey is housed in the Fermilab tape repositories, too, along with an unlikely data-neighbor: genomic research from the Simons Foundation Autism Research Initiative.
One of tape’s biggest drawbacks is speed, even though tape libraries are fully automated and manned by robotic retrieval arms.
“When you want to access a file, you have to find a free tape drive — the robot’s got to find the tape and put it in the tape drive,” said Gene Oleynik, a computing services manager in charge of data movement and storage at Fermilab. “All this communication has to happen to access data, and this happens in an order of minutes,” which, when compared to the fractions of a second it takes to retrieve digital data, is pretty slow going.
To speed up this process for high-demand files, about 35 petabytes of data from tapes has a copy that lives on disks, which offer much faster, although not instantaneous, file access. Disks don’t have to be physically retrieved and put in a drive by robots, and they can be read nonsequentially, with more efficiency than a tape.
Eventually disk systems may overtake tape if they become a cheaper data storage option, but disks can be unreliable.
“Tapes usually don’t fail. You should be able to keep a tape for 30 years and it’ll retain the data. But the problem with disks is, they do fail,” Oleynik said.
A disk system would have to store redundant pieces of files across many different disks to prevent data loss, but duplicating data means more data to store, also making the system less cost effective.
In the case of storing particle physics data, disks probably can’t make tapes obsolete; even in a disk system, there will probably be tapes as backup, just in case. Until a cost-effective data storage medium arrives that can match the reliability of tapes, it looks like they’re sticking around.