About Metadata

Page Index

In order to view XML documents with XSL stylesheets applied, it is best to use IE 6.0.
For more information, CLICK HERE.


Metadata is information about resources. In this context, it is information about language resources: lexicons, audiotapes, transcribed texts, language descriptions, etc. It is analogous to card catalog information about library resources, in that it enables the discovery and retrieval of resources through standardized, machine-readable information. Metadata is becoming very important to the linguistics community, for it gives us the ability to find language resources in the vast and rapidly expanding realm of the Internet.

Metadata not only provides for the discovery of data, but also ensures its long-term intelligibility: since it gives a constrained description of the content of a body of material, it can often provide insight into its relationship with other bodies of data. Indeed, linguists often generate metadata for the purpose of organizing their own material. Metadata can be initially written any digital form: a word file, an excel spreadsheet, a shoebox database or an IMDI Corpus Browser. This section of the school explains how metadata should be formatted, presented, collected and uploaded.

Preferred formats

XML is the preferred format of metadata because it is plain text markup based on an open standard. Thus it is much less likely to be unintelligible in the future than are formats based on proprietary technology. This is important, since we have come to realize how essential it is not to lose data the way we have in the past. E-MELD recommends that metadata conform to recognized metadata guidelines, (for example the OLAC or IMDI standards).

More on preferred formats

OLAC/IMDI metadata comparison table


The true purpose of metadata is to catalog resources in a way which will allow for better searches. The computer-readable formats that must be used to enable such searches are, however, cumbersome for a person to read. Fortunately, through the use of XSL stylesheets, metadata that is captured in an archival xml format can also be displayed in a variety of formats that are more pleasing to the human eye than raw XML.

More on metadata presentation

Basic Requirements (OLAC)

Metadata should have a structured, unified and standard format so that it can be easily retrieved by mechanical, internet-based search engines like the OLAC harvester. Currently, Open Languages Archive Community (OLAC) metadata is based on the fifteen elements of the Dublin Core metadata set and is created with XML, Extensible Markup Language. Initiatives that attempt to standardize linguistic metadata, like OLAC, are important to the preservation of data.

More on OLAC metadata standards

Example of OLAC metadata

Tools for Creating Metadata

The overall purpose of metadata is to allow a server to find information about your data quickly and easily. To make your data available to OLAC search engines, create metadata which complies with OLAC standards, and register it with OLAC. If you create metadata using the OLAC Repository Editor (ORE), your metadata will automatically comply with OLAC standards and be available to the OLAC Harvester. IMDI also provides numerous tools for metadata creation and use.

More on metadata tools


All materials should be consistently labeled, and the labels should be used as keys in your logbook so that metadata records are associated with the things they describe.

More on Labeling

The content of this page was developed using resources listed in our
Annotated Bibliography.

User Contributed Notes
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search