big data

A new machine learning technology tested by Fermilab scientists and collaborators can spot specific particle signatures among an ocean of LHC data in the blink of an eye, much faster than standard methods. Sophisticated and swift, its performance gives a glimpse into the game-changing role machine learning will play in making future discoveries in particle physics as data sets get bigger and more complex.

Fermilab operates the world's largest CMS Tier-1 facility. It provides 115 petabytes of data storage, grid-enabled CPU resources and high-capacity network to other centers. Photo: Reidar Hahn

Data science is one of the world’s fastest growing industries, and as a consequence, a large ecosystem of software tools to enable data mining at ever increasing scales has emerged. Data processing campaigns have distilled the more than 100 petabytes of raw data produced by the CMS experiment to around 10 terabytes. Even this reduced data is still unwieldy for HEP researchers to analyze. Fermilab researchers is currently leading an effort using novel approaches to complete two full CMS analyses.