Metadata Workshop Seeks to Make Mountains of Data More Accessible

July 11, 1997

By Jon Bashor,

The information age has not only spawned a never-ending torrent of data flooding our lives, it has also led to huge libraries of electronic information stored in computers everywhere.

Although such information represents a valuable resource, the sheer volume of data stacking up is making it increasingly difficult for anyone to retrieve needle-sized files of useful information from these virtual haystacks. Today, experts in the field will wrap up a four-day workshop organized by the Lab's Computer Sciences Division and held at UC Berkeley's Clark Kerr Campus. Participants will make recommendations for standards and practices to improve access to the data and make it easier for various organizations to share electronic information.

The issue is so large, it has generated its own terminology. The data created to describe large piles of information is known as "metadata," and one of the techniques used to find valuable nuggets is called "data mining."

"There are not only mountains of data to be conquered, but those mountains come in different varieties," said workshop chairman John McCarthy of the Lab's Computing Sciences Division. "The problem common to all of these vast libraries is that it is very difficult to find exactly what you're looking for and to relate one data set to another. Many organizations still haven't come to grips with the extent of the problem." McCarthy is one of the researchers credited with coining the term metadata some 25 years ago.

According to program committee chairman Frank Olken of Berkeley Lab, metadata can facilitate access, use and sharing of data across cyberspace and time by systematically describing the content, structure and semantics of data residing in information systems, databases or files.

The main sponsor of the workshop is the U.S. Environmental Protection Agency (EPA), which has amassed volumes of environmental data, usually collected on one specific component--such as air, water or solid waste--making it difficult to draw together the full picture of environmental conditions for any specific place. To make and defend policies today, the EPA needs to access data from many sources, ensure its validity, and integrate many perspectives, such as air quality, land use, water quality and chemical toxicity.

The workshop is being held under the auspices of the International Organization for Standardization's Joint Technical Committee on Information Standards. The wide range of organizations participating in the workshop illustrates the scope and importance of this issue: the U.S. Census Bureau, Boeing, Xerox, AT&T Laboratories, the National Institute of Standards and Technology, UC Berkeley, Stanford University, University of Michigan, Rutgers University, the University of Maryland, and Lawrence Berkeley and Los Alamos national laboratories.

