ESnet's Bill Johnston, Nearing Retirement, Reflects on His Career
From Virtual Frog to Grids
April 11, 2008
(This interview originally appeared in the April 11, 2008 issue of HPCwire.)
In a few months, Bill Johnston of Lawrence Berkeley National Laboratory will step down as head of ESnet, the Department of Energy's international network that provides high bandwidth networking to tens of thousands of researchers around the world. In a career that began in the 1970s and has included seminal work in networking, distributed computing, the Grid, and even crossing paths with Al Gore, Johnston has had a hand in the development of many of the high performance computing and networking resources that are today taken for granted. And as he tells it, it all began with the brain of Berkeley Lab scientist Tom Budinger.
Berkeley Lab is now recruiting for a new head of the ESnet Department at the Lab. [see the posting at: http://jobs.lbl.gov/LBNLCareers/details.asp?jid=21495&p=1]. Although he plans to officially retire from LBNL by June 1, Johnston is already planning how he'll spend his time -- doing pretty much what he does now: working, reading for both professional and personal interest, and traveling, but adjusting the ratio somewhat. Berkeley Lab's Jon Bashor managed to get an hour with Johnston to talk about his career, his accomplishments and his future plans.
Question: You've announced your plan to retire this year after 35 years at Department of Energy labs. How did you get started in your career?
Bill Johnston: When I was a graduate student at San Francisco State University, one of my professors spent her summers working on math libraries for the Star 100, which was CDC's supercomputer successor to the CDC 7600. Through this connection, I started taking graduate classes at the Department of Applied Sciences at Lawrence Livermore National Laboratory , then went to work full time in the atmospheric sciences division. There I worked on LIARQ, an air quality model that is still used by the San Francisco Bay Area Air Quality Management District (BAAQMD). Although the code was developed at Livermore, the BAAQMD couldn't run it there. So, I would bring it to LBNL to run on the Lab's CDC 7600 computer.
I began spending more and more time at the Berkeley Lab, and developed data visualization techniques that added a graphical interpretation interface to the code, so that they had dozens of different ways of looking at the data. I went on to turn this work into a data visualization package and made it available to other users of the 7600 that was the main LBNL machine at the time. Through this work I met Harvard Holmes, then head of the graphics group. I also knew the head of the systems group and was offered jobs in each group. Something Harvard said led me to join the graphics group, which was a good decision because five years later the systems group had tanked because there was no new funding to replace the 7600 when it was retired.
Over the years, I took over the graphics group, and was also getting more involved in visualization of science data. As a result, we were often focused on large data sets. These data sets were often stored at remote sites, and accessing them led me into networking. In fact as a result of some of this work, we set up the first remote, high performance network-based visualization demonstration at the Supercomputing conference in 1991. Working with the Pittsburgh Supercomputer Center (PSC), we combined the Cray Y-MP at PSC with a Thinking Machines CM2 in order to do the rendering -- the conversion of the data into a graphical representation -- fast enough for interactive manipulation. We -- mostly David Robertson -- split the code up to run part on the massively parallel CM2 and do the vector processing part on the Cray. The idea was to have the graphics workstation at SC91 in Albuquerque getting data from the supercomputers at PSC. Because high performance TCP/IP implementations weren't available, we partnered with Van Jacobson of LBNL and Dave Borman from Cray to provide a high-speed, wide area version of TCP for a Sun workstation at SC91 and for the Cray at PSC. I remember Van working on the Sun for 48 hours in order to get the two TCP stacks to work together. NSF ran a connection from the conference to the 45 Mb/s NSF network backbone (which effectively was the Internet at the time) into the conference for the first time.
The demo was a volume visualization of Tom Budinger's brain, with the data from some of Tom's high-resolution MRI work. This was real-time visualization -- you could take it, grab it, rotate it. It all started with Tom's brain. [Note: Budinger is a physician and physicist who helped develop MRI and previously headed LBNL's Center for Functional Imaging.]
For myself and Brian Tierney and David Robertson from LBNL, this was our introduction to high performance wide area networking. We got involved more and more with networking and graphics, and were even involved with ESnet on several projects.
One of our projects was with the DARPA MAGIC gigabit network testbed, that included LBNL, SRI, University of Kansas, the Minnesota Supercomputer Center, the USGS EROS Data Center, and Sprint. We worked with Sprint to build the country's first 2.5 gigabit ATM (a technology that is not used much any more) network linking Minneapolis with sites in Sioux Falls and Overland Park and Lawrence, Kansas. Together with Brian Tierney and Jason Lee (both students of mine), we developed the Distributed Parallel Storage System to drive an SRI-developed visualization application over the network with high-speed parallel data streams. This experiment made it clear that in order to get end-to-end high performance you had to address every component in the distributed system from end to end -- the applications, the operating system and network software, and the network devices -- all at the same time in order to make things run fast. This led directly to my interest in Grids. Interestingly, the ideas behind our work in DPSS [Distributed-Parallel Storage System] also fed into the development of GridFTP, which is one of the most enduring Grid applications, and heavily used by the LHC [Large Hadron Collider] community to move the massive data of the CMS and ATLAS detectors around the world for analysis by the collaborating physics community.
Question: Can you elaborate more on your work with Grids?
Johnston: In the late 1990s Ian Foster organized a workshop on distributed computing, and the focus was on writing a book on the component-based, end-to-end approach that was emerging in the research and education community. Bill Feiereisen of NASA (head of the NAS center at the time) participated and he's the one who suggested the name "Grid" for the book that ended up popularizing the subject, saying it reminded him of the power grid with the big computers on the Grid akin to power generators.
At this workshop, we sketched out the outline of a book, covering the basic concepts. The time was right -- we had a group of people interested in a common development. This led Ian and me to establish the Grid Forum. But it never would have turned into a viable organization had Charlie Catlett not attended the workshop. He's the consummate organizational guy and he got the Forum organized and ran it for several years.
And then the Europeans got interested, especially with the planned experiments at CERN. Charlie spent a lot of time traveling to Europe and working with the different Grid organizations over there. This led to combining the U.S. and European efforts to produce the Global Grid Forum (now called the Open Grid Forum -- the result of much more industry participation).
At this time, I was working on assignment at NASA's Ames Research Center, helping build NASA's Grid -- the Information Power Grid. Grids were becoming more well-known, and in 2000 Bill McCurdy from Berkeley Lab talked me into coming back to LBNL full time to establish the Distributed Systems Department. After a few years, I was invited to take over leadership of ESnet.
Question: In the early 1990s, you worked on a number of pioneering projects. One of the best generally known is the virtual frog, which still gets thousands of hits a month [http://froggy.lbl.gov/virtual/]. Can you talk about how they were done and the effects today?
Johnston: The frog was really a side activity, but it came out of my belief that with the Web you ought to be able to do interactive graphics. David Robertson and I launched it in 1994 and it's still being used -- tens of thousands of hits a day. If the server goes down, we get email from science teachers around the world.
During the time of the MAGIC testbed, we developed BAGNET, the Bay Area Gigabit Testbed Network. This was when Bob Kahn's Strategic Computing Program, Gigabit Network Testbeds project, was part of the federal budget, and then Sen. Al Gore became interested in what he dubbed the Information Superhighway. (See "Creating a Giant Computer Highway" by John Markoff, New York Times, September 2, 1990.) Gore was head of the Senate Committee on Commerce, Science and Transportation, and he called together the heads of Sun, Thinking Machines, Cray and DARPA to talk about high speed networking and supercomputing. LBNL, because of its work in the DARPA MAGIC testbed project, was asked to create a demo to show what bandwidth was -- with the possible exception of Gore, the senators on the committee did not know.
We wanted to bring in a live network connection to the Senate building, but Craig Fields, then head of DARPA, so "no way" -- it too risky. So, we used inter-packet arrival times from measurements on the Internet backbone that Van provided to realistically simulate an Internet connection and produced a movie to show the equivalent of a remote connection at different speeds, from 9600 bits/sec to 45 megabits/sec. The data we used was a fluid flow over a back-facing step -- from research done by James Sethian.
Two funny things happened after the demo. When we were all finished, this old senator piped up and said, "All I want to know is what's this going to do for North Dakota?" Then John Rollwagen was talking about the next-generation supercomputer and how they were going to reach gigaflops. Well, Gore just started laughing -- he said "That's what my (1988) presidential campaign was -- a gigaflop!" He was very warm and funny, not at all like he seemed as vice president.
Question: About five years ago, you were named head of ESnet, DOE's network supporting scientists around the world. How does the ESnet of today compare to the 2003 version?
Johnston: When I joined ESnet, the organization was totally focused on ESnet as a production network, with the leadership deciding the directions and the needs of the users. When I came in, I decided to make a fundamental change. There was nothing we could say as a network organization that wouldn't appear self-serving, such as seeking a budget increase. We needed to make a solid connection between the network and the science needs of the Office of Science (SC), and if they needed a bigger, higher speed network, they could help make the case for it.
At the time, Mary Ann Scott was our DOE program manager and an enthusiastic backer of ESnet. We organized a workshop for our user community to look at how the SC science community did their science now and how the process of doing science had to change over the next five to 10 years to make significant advances. At the workshop about two-thirds of the people were from the applications side and the rest were computer and network scientists.
In talking about how science would change, we were able to show that network capabilities would have to evolve to support distributed collaborations, many of them sharing massive amounts of data. It was very clear that there would soon be a deluge of science data on the network. This led the DOE Office of Science to see that a larger network was needed and to fund a substantially expended ESnet with a new architecture known as ESnet4.
The second change was that ESnet was an insular organization focused on the network. We needed to become intimately involved with the community. For example, none of the end-to-end data flows were entirely within DOE. We had to become more outward looking and work with the intervening networks. We created working groups and committees in the U.S. and international R&E communities to determine how to provide better services.
I spent a lot of time on the road talking with the research and education networks that enabled the science collaborations between the DOE Labs and the universities: Internet2 (U.S. R&E backbone network), the Quilt (U.S. regional networks), DANTE (which operates GÉANT, the European R&E network), and two or three of the key European research and education networks. We set up working groups to build close cooperation in developing next-generation services. One example is OSCARS, the On-Demand Secure Circuits and Advance Reservation System developed in partnership with Internet2 and GÉANT. That put us on the path to where we are today -- very close to end-to-end network service guarantees such as guaranteed bandwidth.
The first two workshops were so successful that ASCR (Advanced Scientific Computing Research Office) of the Office of Science -- the DOE program that funds ESnet and NERSC -- continued to organize workshops for gathering the networking requirements of the science program offices in the Office of Science. We're lucky to have Eli Dart organizing these workshops, which will survey each of the SC science programs about once every 2.5 years. Eli came to us from NERSC, where he was used to working with users to learn about their requirements.
Question: One of the more significant changes has been the partnership with Internet2. Can you elaborate on this?
Johnston: The partnership really started on a bus ride between Hong Kong and Shenzhen in China. Shenzhen was China's first "special economic zone" and is a "manufactured" city about 75 miles from Hong Kong. It went from being a village to a city of 10 million in about 30 years. There are two research and education networks in China -- and there is considerable rivalry between them. We could not hold a common meeting with them, so after the meeting in Hong Kong, we took a 1½ hour bus ride from Hong Kong for a second meeting with the other group. Doug Van Houweling, CEO of Internet2, and I got to talking about our vision of what a next-generation R&E network should look like, and it turned out we had very common visions.
At the time, Internet2 was looking at using a dedicated fiber network for their next generation network, but were not completely sure they could swing it financially. We both had commercial contracts for our networks and both contracts would end within a year. What we really ought to do, we agreed, was leverage our efforts to get a good deal. When we described this idea to Dan Hitchcock, now the head of ASCR's facilities division, he was also enthusiastic about using DOE funding to strengthen the U.S. R&E community networking while at the same time getting a good deal for the bandwidth that ESnet needed.
Question: Last year, ESnet completed the coast-to-coast links for ESnet4, the new network architecture. Can you describe the thinking behind that architecture and talk about plans for the year ahead?
Johnston: This is something that came out of the science requirements workshop. One thing that turned the light on for me was a talk by Cees de Laat of the University of Amsterdam. The Netherlands is one of the most fibered countries in the world, and Cees is involved in NetherLight -- the major R&E network exchange point in the Netherlands. He gave a talk on Internet traffic that they saw through NetherLight. Cees observed that there were three easily identified peaks in traffic when you plot source traffic versus the amount of data sent per connection.
The first peak shows a lot of data traveling over a lot of connections, which is how most people think of Internet traffic -- Web, email, "typical" user data transfers, etc. To handle this traffic you need expensive, high performance IP routers capable of routing millions of packets per second.
The second peak shows more data being moved between a smaller number of connections with fewer end points. These patterns are typical of university researchers and some commercial content suppliers -- think high-definition video sent from movie studios to theaters. This traffic is better handled with Ethernet switches, which cost about one-fifth the price of a high-speed router.
The third peak consists of long-lived, data-intensive paths over relatively static optical links. One example of this would be the Large Hadron Collider at CERN, which will produce terabytes of data for months on end and send that data to about 15 national datacenters (two of which are ESnet sites in the U.S. -- FermiLab and Brookhaven).
I realized that this is exactly the nature of the traffic that the evolving process of doing science was going to increasingly be putting onto ESnet and that we should build separate networks for the different types of traffic. The general IP network can manage the 3–4 billion connections per month of general traffic -- email, Web, smaller science data transfers, etc. A second, more specialized network -- the Science Data Network (SDN) -- we would build as a switched circuit network to handle the very large data flows of SC science, and that is where most of our new bandwidth is. The rollout of the IP network is essentially complete and consists of five interconnected 10 Gb/s rings that cover the country. The SDN has several 10 Gb/s paths across the northern half of the country, and these will be closed and formed into rings by this summer. We will add one complete set of 10 Gb/s rings for SDN each year for the next four to five years, resulting in five, 50 Gb/s rings covering the country within five years. This is how ESnet4 is designed and is being built.
To support the third case, the petabytes of data being sent from CERN to Tier 1 national datacenter sites, there is a dedicated optical network with several 10 gigabit channels to delivering data to Fermilab near Chicago and Brookhaven National Laboratory on Long Island essentially continuously -- 24 hours a day, seven days a week, for about nine months out of the year.
Question: In 2007, ESnet also completed its third Metropolitan Area Network. Can you discuss the idea behind these?
Johnston: One thing that became clear when we looked at the science requirements was that the big science national labs need to be connected directly to the high-speed ESnet core, but we couldn't do this with old, commercial tail circuit architecture because these are prone to mechanical damage and can be out of service for days. We needed to build redundant optical networks to connect the labs to the ESnet core network. One evening at a meeting in Columbus, Ohio, Wes Kaplow, then CTO of Qwest, and I sketched out a plan for the Bay Area Metropolitan Area Network (BAMAN) on the back of a napkin over beer. The BAMAN links LBNL, LLNL, NERSC, the Joint Genome Institute, Sandia California and SLAC with a redundant ring, providing multiple 10 Gb/s channels connected to the ESnet backbone. This approach had proven cost effective and reliable.
Given this success, we pushed this metro area architecture forward, next to Long Island, then to Chicago. In the Chicago area, Linda Winkler of Argonne had a number of the network elements in place to Chicago, but no connection from Fermilab to Argonne. To bridge this gap, we got a special grant from the Office of Science and installed new fiber between the labs. Among other things, this involved tunneling under Interstate 55, which turned out to be easier than getting the permits to go through a couple of little villages along the path.
Question: OK. How do you plan to keep busy after you retire?
Johnston: Well, I plan to work part-time for ESnet, helping my successor with the transition. The next person may not have the combination of the science experience, the DOE experience, and the lab experience that I did, so I will be around to mentor and assist.
On the personal side, over the past decade I have developed an intense curiosity about the world's transition to modernity -- the historical and cultural events between about 1870 and 1945 that shaped the world as we know it today. What led to the situation of modern Europe, to the blooming of Fascism in the early-mid 20th century? What role did religion play in the transition? This is an extension of my original interest in German expressionist art, which was an outgrowth of World War I. What caused this sea change in how people create and perceive art? Why did the Weimar Republic -- the German experiment in parliamentary democracy -- fail? How did Hitler get elected? I think that all of this is part and parcel of the transition to modernity and I plan to spend a lot of time reading about these interrelated parts of 19th and 20th century Western culture.
Also, with my wife Nancy's interest in birding, we plan to spend a lot of time exploring California in detail and photographing birds. She is getting quite good at it -- look at nejohnston.org. We will also be travelling more to some of our favorite areas -- Seattle, Hawaii, and New Mexico.