big data

Fast electronics and artificial intelligence are helping physicists working on experiments with massive amounts of data, such as the CMS experiment, decide which data to keep and which to throw away.

The data wranglers

Illustration: Scientists sifting through data

A degree in particle physics or astrophysics can lead to a career in data science. Physicists know how to take enormous amounts of raw data and use it to address a question—often approaching it from multiple angles before finding the answer.

From UKRI, Feb. 22, 2021: UKRI scientists are developing vital software to exploit the large data sets collected by the next-generation experiments in high-energy physics. The new software will have the capability to crunch the masses of data that the LHC at CERN and next-generation neutrino experiments, such as the Fermilab-hosted Deep Underground Neutrino Experiment, will produce this decade.

The prodigious amount of data produced at the Large Hadron Collider presents a major challenge for data analysis. Coffea, a Python package developed by Fermilab researchers, speeds up computation and helps scientists work more efficiently. Around a dozen international LHC research groups now use Coffea, which draws on big data techniques used outside physics.

You are invited to explore today’s most innovative technology products at the Fermilab Virtual Cyber Tech Day (Technology Day) being held on Wednesday, Dec. 16. The exposition will feature: AI Machine Learning, Cloud Computing, Storage, Cybersecurity, Operational Technology, Wireless Technologies, Virtualization, Big Data and much more! Exhibiting companies will include: Apple, Holmans, Adobe, IBM, Google Cloud, Dell Technologies, Red Hat Tanium and many more! The exhibitors will be available to demonstrate their products and field questions. There will also be opportunities…

You are invited to explore today’s most innovative technology products at the Fermilab Virtual Cyber Tech Day (Technology Day) being held on Wednesday, Dec. 16. The exposition will feature: AI Machine Learning, Cloud Computing, Storage, Cybersecurity, Operational Technology, Wireless Technologies, Virtualization, Big Data and much more! Exhibiting companies will include: Apple, Holmans, Adobe, IBM, Google Cloud, Dell Technologies, Red Hat Tanium and many more! The exhibitors will be available to demonstrate their products and field questions. There will also be opportunities…

You are invited to explore today’s most innovative technology products at the Fermilab Virtual Cyber Tech Day (Technology Day) being held on Wednesday, December 16 from 9:30 a.m. to 1:30 p.m. CST. The exposition will feature: AI Machine Learning, Cloud Computing, Storage, Cybersecurity, Operational Technology, Wireless Technologies, Virtualization, Big Data and much more! Exhibiting companies will include: Apple, Holmans, Adobe, IBM, Google Cloud, Dell Technologies, Red Hat Tanium and many more! The exhibitors will be available to demonstrate their products…

A new machine learning technology tested by Fermilab scientists and collaborators can spot specific particle signatures among an ocean of LHC data in the blink of an eye, much faster than standard methods. Sophisticated and swift, its performance gives a glimpse into the game-changing role machine learning will play in making future discoveries in particle physics as data sets get bigger and more complex.

Fermilab operates the world's largest CMS Tier-1 facility. It provides 115 petabytes of data storage, grid-enabled CPU resources and high-capacity network to other centers. Photo: Reidar Hahn

Data science is one of the world’s fastest growing industries, and as a consequence, a large ecosystem of software tools to enable data mining at ever increasing scales has emerged. Data processing campaigns have distilled the more than 100 petabytes of raw data produced by the CMS experiment to around 10 terabytes. Even this reduced data is still unwieldy for HEP researchers to analyze. Fermilab researchers is currently leading an effort using novel approaches to complete two full CMS analyses.