Small files, big storage solutions

The Grid Computing Center houses small files for a number of experiments at Fermilab. Photo: Reidar Hahn

Experts in the Scientific Computing Division have developed a more efficient method to store data to better cater to the needs of Intensity Frontier experiments.

Last year, SCD developers created a method to efficiently store and access small-sized data files. The method is useful for several neutrino experiments, which normally store many small, sparse files that range from hundreds of kilobytes to hundreds of megabytes in size. That’s a small fraction of what you could store on your flash drive.

For example, NOvA, a neutrino experiment, currently stores about 14 terabytes of data per month. That’s downright tiny compared to the amount of data per month the collider experiment CMS currently stores: over 1,000 terabytes, which can be written to tape efficiently.

“Neutrino experiments are producing more and more of these small files,” said Data Movement and Storage Department Head Gene Oleynik. “There is a growing demand for this new storage feature.”

Fermilab uses in-house software called Enstore to store scientific data on digital tape.

It takes time—a few seconds per file, which can turn into hours for thousands of files—to write file markers on the tape, ensuring the whole file is indeed transferred, said Enstore project leader Alexander Moibenko.

“We resolved this problem by aggregating files into larger containers, which are subsequently written to tape,” Moibenko said. “It could take several hours or even days to write thousands of small files to a tape, but we use much less time by writing files that are bigger and thus optimal for the tape drive.”

The method’s efficiency also lessens wear and tear on the tapes and tape drives, increasing their lifetimes.

Additionally, SCD builds fail-safes into their storage method, which is important when storing experiment data for a whole day or longer. Until a file is fully written, catastrophic errors could result in large data loss. A robust file storage system guards against data loss or corruption until files are written to tape, Oleynik said.

The method has been on the minds of SCD developers for years, Oleynik said. Neutrino experiments, which will ramp up their data output in future years, pushed the department to create the method and put it to practice.

When users request to read a packaged file, Enstore can grab the file from its cache instead of the tape, which has a slower transfer rate. Users can also define rules to determine how to best store their experiment’s data.

Users now have less to stress about in terms of potential data loss, Oleynik said.

ArgoNeuT, Lattice QCD and MINERvA are also currently using the new method.

“I think this feature will successfully meet much of our new and growing storage demands,” Moibenko said.

“The Data Movement Development Group has done an excellent job developing the data storage software for our Intensity Frontier experiments,” Oleynik said. “It’s made the storage process that much easier.”

Sarah Khan