
This plot shows the number of virtual machines instantiated at Amazon over a period of just over six weeks, starting on Jan. 16. Each color represents a different type of machine, each with varying memory configuration, disk capacity and number of cores. The figure represents an average of 52,000 compute cores, peaking at 58,000.
Throughout any given year, the need of the HEP community to consume computing resources is not constant. It follows cycles of peaks and valleys driven by holiday schedules, conference dates and other factors. Because of this, the classical method of provisioning these resources at providing facilities has drawbacks, such as potential overprovisioning. Grid federations like Open Science Grid offer opportunistic access to the excess capacity so that no cycle goes unused. However, as the appetite for computing increases, so does the need to maximize cost efficiency by developing a model for dynamically provisioning resources only when they’re needed.
To address this issue, the HEP Cloud project was launched by the Scientific Computing Division in June 2015. Its goal is to develop a virtual facility that provides a common interface to access a variety of physical computing resources, including local clusters, grids, high-performance computers, and community and commercial clouds. Now in its first phase, the project is evaluating the use of the “elastic” provisioning model offered by commercial clouds such as Amazon Web Services. In this model, resources are rented and provisioned dynamically over the Internet as needed.
The HEP Cloud project team successfully demonstrated this elastic model in January and February, when team members, using Amazon Web Services, boasted the ability to add 58,000 cores to the CMS pool of computing resources ― an impressive 25 percent increase. This burst of computing resources is dedicated to CMS experimenters to generate and reconstruct Monte Carlo events in preparation for the prestigious Recontres de Moriond conference, where they will share their findings with their international colleagues.
This project aims to minimize the cost of computation. The R&D and demonstration effort was bolstered by a temporary pricing arrangement with Amazon. Additionally, cost was contained using the Spot Instances Market, a rental model that allows a cloud to sell their unused computing capacity at a fraction of the regular price. Adding further value, the HEP Cloud’s decision engine, the brain of the facility, oversees market price fluctuation, looks at all available resources across the cloud and ensures that resource provisioning is optimal.
While on the road to success, the project team had to overcome several challenges, including fine-tuning configurations to optimize Amazon limits on resources such as storage capacity and devising a strategy to distribute the needed auxiliary data across Amazon’s data centers to minimize cost and data-access latency.
Most recently, the project team modeled HEP Cloud as a viable solution for the Intensity Frontier community. Last week, NOvA ran data-intensive computations readily consuming Amazon resources to produce particle identification data. And thanks to the project’s integration activities, NOvA is using the same familiar services they use for local computations such as data handling and job submission.
The project team is planning to transition the HEP Cloud facility into regular use by the HEP community in July.
Gabriele Garzoglio is the head of the Scientific Data Processing Solutions department. Burt Holzman is the assistant division head for facilities coordination in the Scientific Computing Division.