Data mining for b-quarks

Tracks from a b-quark (yellow) and an ordinary quark or gluon (purple), overlaid on a photo of the CMS tracker, in approximately the position where these particles were observed.

Of the six known types of quarks, only two can be distinguished in a typical particle physics experiment. The top quark, once produced, has a dramatic signature involving cascades of decays from heavy particles into lighter ones. The bottom (b) quark also decays into lighter particles, but these are hidden in a spray of additional particles that form along with it, called a jet. A jet is essentially random: random particles moving in nearly random directions. The lighter quarks—charm, strange, up, and down—produce only jets when they decay.

In practice, this means that it’s almost impossible to distinguish an up-quark from a down-quark. Fortunately, most of the questions that scientists want to address do not rely on telling the difference.

The b-quark, however, is interesting for a variety of reasons: It can be part of a signal for new phenomena; it is part of the top-quark decay chain; and it probes fundamental symmetries in the laws of nature. Finding a way to distinguish b-jets from all other jets would help many scientists at once.

Jets from b-quarks are a little different in a lot of ways. Since b-quarks fly a small distance before they decay (about 5 millimeters), some particle trajectories trace back to this decay rather than the collision point. Jets with a b-quark are slightly narrower with slightly fewer charged particles and are more likely to include an electron or muon. No one characteristic is enough to tell us, “This is certainly a b-jet and that is not,” but the confidence adds up with each additional clue, and physicists are able to assign a probability that a given jet is a b-jet. In a recent paper, CMS scientists presented the state of their art: For an 80 percent probability of identifying a real b-jet, they have a 10 percent probability of misidentifying a non-b-jet.

This is a “big data” analysis, much like the ones for which Internet companies such as Google, Amazon and Facebook are now famous. Among the trillions of recorded collisions that produced jets, these scientists found the few percent that are b-jets. Also like Internet data miners, the scientists used sophisticated statistical techniques and machine learning to optimize their search. Unlike big data analysis, however, this b-jet algorithm has quietly improved scientific understanding across more than 40 analyses, including searches for new physics, detailed studies of the top quark, and measurements of the Higgs boson.

Jim Pivarski

The physicists pictured above were all responsible for key aspects of the b-quark identification effort.
During the 7- and 8-TeV LHC runs, approximately 14 percent of the forward pixel detector was not working. Due to the efforts of the group pictured here, all of the broken channels were fixed in the SX5 clean room during the first summer of LS1. The forward pixel detector is now ready to rejoin global CMS running with all detector channels functional.