Looking at ESnet: An Interview with Bill Johnston
by Alan Beck, Editor-in-Chief, HPCwire
(Reprinted from HPCwire, Vol. 12, No. 48, December 5, 2003)
Bill Johnston, a distinguished networking and computing researcher, was recently named Manager of the Energy Sciences Network (ESnet), a leading-edge, high-bandwidth network funded by the U.S. Department of Energy (DOE), Office of Science. ESnet is managed by DOE's Lawrence Berkeley National Lab.
ESnet is used by thousands of scientists and researchers worldwide, and provides a high-end platform for unclassified scientific collaboration and exploration. The network provides reliable connectivity at up to 10 Gigabits per second. Used for everything from videoconferencing to climate modeling, and flexible enough to accommodate a wide variety of data-intensive applications and services, its traffic volume is doubling every year, and currently surpasses 200 terabytes (200 million megabytes) per month.
Managing and directing a network such as ESnet requires a unique skillset and knowledge, as well a vision of where high-end, distributed network computing is headed. In this sense, Johnston is an ideal person for the job. Over the last decade, Johnston has served as the Principal investigator of the DOE Science Grid and head of Berkeley Lab's Distributed Systems Department.
Johnston is also a key member of a tight-knit group of people who have pioneered Grid computing over the last five years. Grids are specialized configurations of computing systems, storage systems, networks and experimental facilities for advancing scientific collaborations between geographically dispersed institutions. By integrating these dispersed research sites, computing and data resources in real-time, scientists are able to collaborate and run experiments, analyze data and conduct simulations almost immediately.
HPCwire recently caught up with Johnston to get his views on where he thinks ESnet is heading, and how it will get there.
HPCwire: How has your past experience led up to directing ESnet?
Johnston: This is a career change for me. I've spent the last 15 years doing computer science research in distributed high-speed computing, which really led to Grids. There were probably a few dozen of us who formulated the idea of Grids four or five years ago. I see Grids as just another manifestation of high-speed distributed computing and my interest in that field came directly out of the networking world. I was involved in many of the early so-called gigabit test beds, the very early attempts at building high-speed networking in this country.
HPCwire: What are some of the latest developments with ESnet?
Johnston: ESnet is two things. It is the underlying network infrastructure that provides the physical and logical connectivity for virtually all of DOE. It also provides a collection of science collaboration services to the Office of Science community — videoconferencing, teleconferencing, data conferencing, and most recently, digital certificates and a public key infrastructure (PKI). PKI is a system for issuing digital-identity certificates to both humans and services, and it is critical for making Grids function, because this is how you formalize and establish trust between disjointed organizations so they'll allow sharing of their resources among each other. The big milestone there is that after a year of negotiation, we have cross-certification agreements with the European high-energy physics community. ESnet was the first network to achieve this authentication between the European and U.S. physics communities, paving the way for future international collaborations.
HPCwire: How does ESnet impact scientific research?
Johnston: The primary goal of the ESnet in supporting the science community is providing the network capacity that is needed to support large-scale scientific collaborations. The science community is finally starting to trust that ESnet is a real and reliable infrastructure just like their laboratories are. So what's happened, over the last three or four years is that the science community has actually started to incorporate networking into their experiment planning as an integral part of it. The high-energy physics community, for instance, could not make the big collaborations around the current generation of accelerators work without networking. So their plans for analyzing data are absolutely tied to moving that data from the accelerator sites out to the thousands of physicists around the world who actually do the data analysis.
In that sense, the primary goal of ESnet is to ensure that we have the peering — the relationships with other networks — with enough bandwidth, and also that ESnet has sufficient capacity to serve the sites within the U.S. So the most recent thing we've done in that regard is to put up what is currently a network ring around the country — half OC-48 (2.5 Gigabits/s) and half OC-192 (10 Gb/s). It will be all OC 192 by mid-to-late 2004. And then we will connect the sites into that ring with as high a bandwidth as is feasible.
HPCwire: ESnet runs with the help of some advanced networking technologies. How are they a departure from the technology they are replacing, and what's coming down the pike?
Johnston: From a technological point of view, it's an evolution from an ATM- based infrastructure to the current generation of 'pure IP networks.' So we have these optical rings that directly interconnect IP routers and there is no intervening protocol. It's essentially IP directly on the fibers, or actually on the lambdas — multiple chanels of different 'color' light — in the case of this ring. This ring is a Qwest DWDM — Dense Wave Division Multiplex Optical Fiber — so on each fiber they carry either 64 or 128 10-Gigabit/second clear communication channels.
In the future, I believe we're likely to see a fundamental shift in the architecture of the network and may see switched optical networks and the re- emergence of a level 2 switching fabric in the wide area, such as the Ethernet switches that everyone uses today in the local area (LANs). Ethernet switches are level-2 switching fabric, and then the IP routing fabric is separate and on top of that. It's possible that the same thing may happen with lambdas — we may find that we do optical switching of lambdas in addition to IP routing. The other possibility is that we may acquire enough lambdas that we can provision dynamically. So if some labs need to transfer huge volumes of data between each other, we may be able to put into service for a couple of weeks a whole OC 192 network — then you're not putting these multi-terabyte file transfers onto the same network that everyone else is trying to use, and you can create direct end-to-end high bandwidth paths between the sender and the receiver.
HPCwire: What's a good metaphor for visualizing these types of networks?
Johnston: You know when you go into a museum and they have audio wands to guide you. They have several sets of those. You have one museum, and that's the infrastructure. But there may be two or three exhibits going on at the same time. Depending on which exhibit you're interested in, they'll give you a different audio wand, which will carry you in a different route through the museum and give you a different story about the pictures. That's an example of overlay networks.
HPCwire: How much data travels on ESnet, and how fast is it growing?
Johnston: We are currently at a level of roughly 200 terabytes of data a month — that's 200 million megabytes per month. Now, there have been spikes that have made it look like it was growing much faster than that. One of these spikes triggered the planning for the new optical ring backbone. It's a fundamental planning charge of ESnet to track this kind of thing, together with the requirements that the DOE program offices can predict, e.g., the ESSC (ESnet Steering Committee) will try to predict their requirements in terms of new major experiments coming online or new science infrastructure. In addition to which, we predict future needs based on current traffic trends. However, that's a risky business because if a couple of big high-energy physics experiments come on and we don't know about it, that's a very non-linear growth and increase in traffic that may be transient.
HPCwire: Big science projects can cause huge spikes in traffic on ESnet — which applications require the most bandwidth and processing?
Johnston: Most of the big traffic spikes up to this point have been created by bulk movement of data. Increasingly, we are going to see, for instance, high-energy physics doing distributed processing of data, using Grid-based systems at many different sites to process a single data set. That will generate inter-processor traffic directly across the network. It's a little hard to predict how that will compare with bulk data transfers, but it will be significant. That's sort of the next big use — Grids coordinating mini-clusters operating cooperatively on a single set of data, where the clusters are scattered all over the world.
HPCwire: What types of changes might we expect in ESnet in the future?
Johnston: In August, 2002 we got together with representatives of the science community in eight major DOE science areas and asked these folks, "how is your process of science — how you conduct your science — going to change over the next 10 years?" We analyzed the impact on both middleware and networking. There are major changes in store, many of them having to do with things like the climate community. They believe the next generation of climate models has to involve coupling many different models together. And you're not going to bring all of these models to one center to work together, so you'll have to couple them together over the network in order to get a single, integrated model. That's their vision, and it relies on a lot of advanced networking and middleware.
The situation is similar in other areas of science, such as the Spallation Neutron Source at Oak Ridge. They want to be able to aim the output of their detectors at big computers located across the network, so they can do real-time analysis of their data. In many cases, both for the Fusion Energy and Spallation community, they would like to take their data and analyze it immediately, and then use that to adjust the next phase of the experiment. In Fusion Energy, for example, you have about 15 minutes to look at the data and then change the parameters. In the past, they were doing this largely blind. It's a big deal for them, because one of these experiments may run for just a month or a year, and that may be a scientist's one chance to run these experiments in five years, because other people are lined up to use these facilities.
HPCwire: You are known for your work in the Grids community. How does that come into play with your work at ESnet?
Johnston: The Grids community is attempting to come up with a common and standard set of middleware for the science community. The idea is to come up with standardized middleware that you can use as a platform for building these distributed applications. In many ways, the Grids community and the web services community are merging. That is, the functions that we need in the Grids world are largely being cast in terms of an enhanced view of web services. So the Grids community has actually proposed a half dozen modifications to the W3C on web services standards, because the web services standards as they exist aren't adequate for what we want to do in Grid computing.
In the web services view of the world, you go to a web server and it performs this service, as opposed to the Grids community view of the world, where no web server is ever going to perform the computational or data analysis tasks of science, it's just a launching point to send the actual task to a supercomputer or to a thousand other computers and get the results back. So the coordination and management of that in terms of a web service wasn't there and so Grids added that in and proposed they've put it into the web services standards. Even if they don't, we already have our own version that has those features.