Hopper (Phase 1) Prepares NERSC for Petascale Computing
February 26, 2010
Contact: Linda Vu, firstname.lastname@example.org, (510)495-2402
Credit: Photo by Roy Kaltschmidt
|(Left) Hopper (Phase 1) is a Cray XT5 with 664 compute nodes each containing two 2.4 GHz AMD Opteron quad-core processors. (Left) The Hopper external Lustre File System contains 2 PB of disk space.|
After several months of rigorous scientific testing, the Department of Energy's (DOE) National Energy Research Scientific Computing Center (NERSC) has accepted a 5,312-core Cray XT5 machine, called Hopper (Phase 1).
Innovatively built with external login nodes and an external filesystem, Hopper Phase 1 will help NERSC staff optimize the external node architecture before the second phase of the Hopper system arrives. Phase 2 will be a petascale system comprised of 150,000 processor cores and built on next generation Cray technology.
"Working out the kinks in Phase 1 will ensure a more risk-free deployment when Phase 2 arrives," says Jonathan Carter, who heads NERSC’s User Services Group and led the Hopper system procurement. "Before accepting the Phase 1 Hopper system, we encouraged all 300 science projects computing at NERSC to use the system during the pre-production period to see whether it could withstand the gamut of scientific demands that we typically see."
According to Katie Antypas, a consultant in the NERSC User Services Group, the new external login nodes on Hopper offer users a more responsive environment. Compared to the Cray XT4 platform, called Franklin, the external login nodes have more memory, and in aggregate, have more computing power. This allows users to compile applications faster and run small post-processing or visualization jobs directly on the login nodes without interference from other users.
Because Hopper has 2 PB of disk space and 25 GB/sec of bandwidth on the external filesystem, users with extreme data demands will see few bottlenecks when they move their data in and out of the machine. Additionally, the availability of dynamically loaded libraries enables even more applications to run on the system and adds support for popular frameworks like Python. This feature helps ensure that the system is optimized for scientific productivity.
"Hopper turned out to be a lot faster than we anticipated, which allowed us to run more simulations at higher resolutions," says Yi-Min Huang, a research scientist in the Space Plasma Theory Group at the University of New Hampshire.
With pre-production computing time on Hopper, Huang and his colleague Amitava Bhattacharjee, professor of physics at the University of New Hampshire, ran extremely detailed simulations of magnetic reconnection, a process by which magnetic field lines break and rejoin, releasing tremendous amounts of energy along the way. This research is vital for understanding solar flares that can disrupt long-range radio communications on Earth, and will help researchers refine the design of magnetic confinement devices for creating zero-emission fusion energy.
"The computing time on Hopper encouraged us to explore magnetic reconnection in the high-Lundquist-number regime in greater detail that we wouldn't have done otherwise. In fact, our high-resolution runs on Hopper showed us that some of our previous simulations were not fully resolved, and thus not as reliable as we believed," says Huang. "The experience we gained from these simulations is invaluable for our future pursuit in this area with our NERSC allocations."
Meanwhile professor Artem Masunov, of the University of Central Florida's NanoScience Technology Center, and his graduate student Workalemahu Mikre, used the free pre-production time on Hopper to better understand the force that drives peptide aggregation into amyloid fibrils, a process which causes neurodegenerative diseases like Alzheimer's, Parkinson's and Type II diabetes. In addition to understanding peptide aggregation, Mikre is also designing small molecules to prevent it.
"With the free pre-production time on Hopper, I could run my simulation on 300 to 400 processor cores for about 12 to 24 hours. As a result, my adviser and I identified which small molecules and mutations led to the fastest disaggregation of decamer assemblies composed of several hexapeptides, including insulin, tau, and Aβ fragments," says Mikre. "This knowledge could contribute to the rational design of drugs for Alzheimer’s treatment and amyloid-specific biomarkers for diagnostic purposes."
"The primary reason for architecting Phase-1 of Hopper differently was to make it more productive and user-friendly for our diverse set of science users," says Carter. "Although the technology that makes up the compute portion of Phase 1 of Hopper is newer, it does not differ significantly from the hardware on Franklin. The increase in usability is largely due to the external node architecture."
The hardware for Phase-2 will be delivered later this year.