The value of metadata

The advantages of retrieving astronomical data using a grid model based on interoperable standards are clear: instead of visiting a number of different archives, explains Plante, an astronomer will be able to send a single query to many different servers, which will then respond with the results of that search.

It may sound like a typical Google search, but astronomy researchers' needs are different from those of the average Internet user. “The NVO's interfaces must be intelligent enough to have some understanding of astronomical content,” says Plante. For example, a Google-like search for infrared images of a specific quasar would return text documents containing the keyword “infrared,” and possibly a source name, but it probably would not contain any other valuable information, such as the name of a specific kind of telescope, for example, or the location in the sky where the data was originally recorded.


As chair of the NVO Metadata Working Group, Plante is developing a framework that would organize astronomical data in ways that would allow astronomers to zero in on the information that they need. This means cataloguing data retrieved from a broad variety of repository sites using a universally machine readable standard, like XML.

XML, short for Extensible Markup Language, is perhaps best known as a hypertext markup language both more universal and customizable than HTML. However, it is also an industry standard for marking up metadata, or “data about data.” This information can document how and when datasets are collected and describes the format in which they are encoded. NVO metadata, for example, may include such information as telescope names, frequency ranges, and sky positions.



Most astronomy data is stored in a format called FITS, developed in the 1980s and used widely throughout the astronomy community. XML won't replace FITS as the standard format for NVO data. Instead, it will provide important information about data encoded in FITS to speed the query and retrieval process, a development that is becoming more and more necessary as data files increase in magnitude from the gigabyte to the terabyte to the petabyte.

“Moving data around physically is one of the hardest problems” facing researchers and NVO architects, says Brunner. “It can be archived, but getting it out of archives easily and quickly is a serious challenge.” The NVO, however, would permit an astronomer to access the metadata describing an image of a half-gigabyte or more and decide whether or not downloading the entire image would be relevant to her research. >>