Last Sunday, I gave a keynote address on large scientific collaborations at a conference organized by the American Association for Cancer Research in San Francisco. The theme of the conference was how to translate genome discovery into effective biomarkers and therapies, from basic scientific research to drug discovery. It was an unusual talk for me to give and for biologists to hear, but one that was a lot of fun to prepare. Because the projects to translate voluminous genomic data into effective therapies have become very large and complex and require the collaboration of many institutions, the organizers thought it would be useful for me to describe how we successfully manage the very large collaborations in particle physics such as CDF, DZero, ATLAS or CMS. It was also important for me to analyze how projects in particle physics differ from large biological projects in order to understand whether the particle physics experience is at all relevant.
The complexity of some of these biological projects is mind boggling. Tissue needs to be collected from different stages of tumor development from many patients, followed by whole genome sequencing to determine the many mutations at play in the development and treatment of the disease, then tying those mutations to biological pathways and functions, understanding interactions with other pathways and identifying potential targets for new therapies. All along the way there is a lot of heterogeneity, a playground for statisticians. It is enough to give anyone a headache and it makes me glad to be a particle physicist dealing with relatively simple systems. However, the technologies involved in genome sequencing and proteomics have advanced so quickly that one can finally imagine making big strides in the war on cancer.
I was pleased that our field is recognized for the very successful model of our large collaborations. We have had time to evolve this model and build an international perspective with our institutions and funding agencies. Biological research by contrast is making its evolution into large projects at warp speed. In particular the amazing advances in sequencing technology make the generation of data today comparable to that of the LHC, quickly moving into the tens of petabytes. It is this rapid evolution that leads biologists to look for successful models elsewhere as they tackle their most challenging projects.