Skip to main content

Data Management: Metadata

Metadata

Metadata is information about the context, content, quality, provenance, and/or accessibility of a dataset. In order for your data to be accessible to you, your colleagues, and other researchers, it must be properly documented. Put simply, metadata is data about your data. It is:

  • Frequently required for depositing datasets in disciplinary repositories, or for publishing in a research journal;
  • Necessary for the longevity and reproducibility of research data;
  • Useful for analyzing the data in data files.

"Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers. Well organised, well documented, preserved and shared data are invaluable to advance scientific inquiry and to increase opportunities for learning and innovation."
--"Create and Manage Data," the UK Data Archive,

To Do While Collecting or Creating Data

  • Make a note of all file names and formats associated with the project, how the data are organized, how the data were generated (including any equipment or software used), and information about how the data have been altered or processed.
  • Include an explanation of codes, abbreviations, or variables used in the data or in the file naming structure.
  • Keep notes about where you got the data so that you and others can find it.

Metadata Elements - Things to Document

Elements of metadata include:

Title - Name of the dataset or research project that produced it

Creator - Names and addresses of the organization or people who created the data

Identifier - Number used to identify the data, even if it is just an internal project reference number

Subject - Keywords or phrases describing the subject or content of the data

Funders - Organizations or agencies who funded the research

Rights - Any known intellectual property rights held for the data

Access information - Where and how your data can be accessed by other researchers

Language - Language(s) of the intellectual content of the resource, when applicable

Dates - Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., maintenance cycle updated schedule

Location - Where the data relates to a physical location, record information about its spatial coverage

Methodology - How the data were generated, including equipment or software used, experimental protocol, other things one might include in a lab notebook

Data processing - Along the way, record any information on how the data have been altered or processed

Sources - Citations to material for data derived from other sources, including details of where the source data is held and how it was accessed

List of file names - List of all data files associated with the project, with their names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov')

File formats - Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software required to read the data

File structure - Organization of the data file(s) and the layout of the variables, when applicable

 

Tools for Creating Metadata

Metadata Standards

A directory of Disciplinary Metadata standards is available from the UK's Digital Curation Centre. This site is browseable by discipline. For a visualization of the wider metadata universe most often used in the humanities, see Jenn Riley's Seeing Standards. Each of the 105 standards listed here is evaluated on its strength of application to defined categories in each of four axes: community, domain, function, and purpose. The strength of a standard in a given category is determined by a mixture of its adoption in that category, its design intent, and its overall appropriateness for use in that category. The Metadata Standard Glossary provides additional description for each of the standards listed.

Additional Information

The Digital Library Federation has a wiki for best practices for shareable metadata. It includes general recommendations for best practices, recommendations for classes of data elements, and best practices for technical aspects of metadata.

An NSF funded project, DataONEpedia, is a database of best practices for data management. The goals are to provide a place where data can be collected, managed, and updated by appropriate individuals, and to ensure that collected data can be reused and potentially presented in multiple ways.