NERSC Builds Gateways for Science Sharing
May 26, 2009
Programmers at the Department of Energy's National Energy Scientific Research Computing Center (NERSC) are working with science users to design custom web browser interfaces and analytics tools, a service called “science gateways,” which will make it easier for them to share their data with a larger community of researchers.
“The goal of these science gateway projects is to allow users to access their data, perform interesting computations and interact with the NERSC resources using common web-based interfaces and technologies,” says Shreyas Cholia, of NERSC’s Open Software and Programming Group. “This makes it easier for scientists to use the NERSC center, while creating opportunities to build new collaborative tools to share this data with the rest of the scientific community.”
He notes that NERSC engineers can help the teams with everything from designing a database to building a web-browser interface, developing analytic tools, and deploying the gateway. Because each project is unique, science teams have the option of creating a public portal, which allows anybody to access their data, or an authenticated portal, which restricts access to collaborators only.
Access Streamlines the Scientific Process
“Data-sharing tools create a fertile environment for scientific breakthrough by streamlining the scientific process,” says James Hetrick, a professor of physics at the University of the Pacific in Stockton, Calif., who is currently working on a NERSC science gateway project called the Gauge Connection.
According to Hetrick, the Gauge Connection portal is especially useful for researchers interested in quantum chromodynamics (QCD), the theory that describes the complex interactions between quarks —the constituents of protons, neutrons and certain other subatomic particles. For example, physicists currently do not understand where protons get their mass. Although protons are each made of just three quarks, the quarks' mass actually accounts for less than two percent of the proton's mass. Scientists now suspect that the mass may actually come from the “glue,” or the strong force, that binds quarks together inside the proton; however, the exact mechanism of this is unclear.
“We study quarks and the strong force because they are among the most basic constituents of matter,” says Hetrick. “In the 19th century, nobody knew anything about atoms or the electromagnetic force that holds them together, but once we figured out the basic physics behind this phenomenon, we were able to move into the electric age, and subsequently, with our understanding of quantum physics, the electronic and information ages.”
To learn more about the strong force, physicists use supercomputers to create a series of QCD lattices. These are four-dimensional representations of the quantum fluctuations of quarks and the force fields between them in a very tiny space-time region. Analyzing a large set of such lattices allows researchers to understand the physics of quarks.
According to Hetrick, a series of lattices can take years to generate on a supercomputer, even with tens of thousands of processors dedicated to the project. So far his team has generated over 20 terabytes of data, equivalent to more than 25,600 hours of video, at various computing centers around the county. As part of the Gauge Connection science gateway project, his team will consolidate all of this data at NERSC. When this portal launches, any researcher interested in accessing this information can get it through the Gauge Connection science gateway via any web browser.
“Once you are done with the lattice production stage, there are many different sorts of analysis projects that one might do, and that’s where the science gateways come in. Rather than having to regenerate new lattices with the same kind of quantum fluctuations, which is very costly, other scientists can use existing sets to do the analysis part for their own ideas,” says Hetrick. “These gateways greatly expand the scientific process by allowing us to recycle very valuable data.”
According to Cholia, the science gateways epitomize Metcalfe’s Law, which states that the usefulness of an information source, whether it is a database or a network, increases quadratically with the degree of connectivity to other information sources.
“The Gauge Connection is just one example of how these tools make science accessible to a much wider selection of users, while increasing the scope of the questions being asked,” says Cholia. “Using a web services approach, one can query the underlying data in the form of a simple URL and then combine the results of these queries with other online data sources. This allows for the creation of mashups, federation, and comparison between multiple sources.”
Science Is Not One-Size-Fits-All
Karen Schuchardt, a computational scientist at the Pacific Northwest National Laboratory, believes the NERSC science gateway tools present an opportunity to help researchers remotely manage large datasets. She is currently heading an effort to create a portal that allows researchers to remotely access data generated by the Global Cloud Resolving Model (GCRM) project. Members of this team seek to develop a model that can simulate Earth’s climate at a two to four-kilometer resolution across the entire globe.
“Simulating climate across the entire globe at these resolutions is an extremely complex and resource-intensive process, requiring extensive amounts of computer and human hours,” says Schuchardt. “We cannot easily generate this data every time someone needs it, so we view each dataset as an extremely valuable resource and want to make it available to as many collaborators as possible.”
Because GCRM datasets are extremely massive, she notes that it would take a long time to transfer an entire dataset across a network. Also, most researchers can only analyze a small portion of the data at a time, so in addition to creating a portal that will deliver this information to collaborators, Schuchardt is working with NERSC staff to create a gateway tool that will allows the scientists to remotely access the data at the supercomputing centers where it is generated, search the metadata for what they need, and download that portion for analysis.
“Management for these volumes of data is a major challenge, and the science gateway tools that we are building will go a long way toward helping our remote researchers get the data they need, which will in turn pave the way for scientific breakthroughs,” says Schuchardt.
While access to the GCRM gateway will primarily be targeted at scientific researchers, Peter Nugent, a staff scientist at the Lawrence Berkeley National Laboratory's Computational Research Division (CRD) and NERSC, is creating a public gateway that will deliver astronomy data to researchers around the world. Called Deep Sky, this portal will give astronomers instant downloadable access to more than 8 million images of the northern sky archived at NERSC. Most of this astronomical data was collected over the past decade by the Nearby Supernova Factory (SNfactory), a project that seeks to measure the accelerating expansion of the universe with Type Ia supernovae.
“This unique collection of data allows astronomers to track how the sky has changed over the past nine years,” says Nugent, who is the project lead for Deep Sky. “It will serve as an invaluable resource for astronomers who are interested in finding cosmic events like supernovae and gamma ray bursts, or tracking the trajectories of asteroids and comets.”
“There is no one size fits all solution for these gateways, but we try to recycle successful approaches into methods that other teams can use,” says Cholia.
He notes that while projects like Deep Sky relied on NERSC resources to develop everything from the database to the web browser interface, other projects like the European Space Agency's Planck Surveyor mission developed gateway portals on their own and are relying on NERSC to provide security, data storage, job management and grid access.
“We realize that most scientists are not computer programmers and may not have the expertise, time, or resources to develop tools that will allow them to fully leverage their rich datasets,” says Cholia. “The NERSC science gateway services present an opportunity for them to team up with engineers and build the technologies that will foster breakthroughs.”
For more information on of the NERSC science gateways, please visit: http://www.nersc.gov/nusers/services/Grid/SG/sg.php
The GCRM project is headed by David Randall, professor of atmospheric research at the University of Colorado, in Fort Collins, Colo., and is supported by the DOE's Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program. Schuchardt's effort to create tools that will enable an international community of collaborators to access this information is supported by DOE's Scientific Discovery through Advanced Computing (SciDAC) program. The Deep Sky project was partially funded by SciDAC's Computational Astrophysics Consortium, which is led by Stan Woosley of the University of California at Santa Cruz..