Wikier

Metadata and dataset documentation

Norsk versjon - Metadata og dokumentasjon av datasett


Topic page about research data | Pages labeled with Open Data

Here you will find tips on how to make research data understandable and reusable in the future with the help of metadata and other documentation.

What is metadata?

In short, metadata is "data about data". It is information that describes the data and gives it meaning. Without good metadata, it will often be difficult to understand the dataset and how the data can be used. Metadata is also important for making datasets searchable and retrievable, even when the dataset itself cannot be published openly. This makes metadata an important element for FAIR data.

Typical metadata is information about

  • who has produced or is responsible for the dataset
  • the subject area in question
  • the type of data in question
  • what formats they are in

The metadata may also include information about the equipment or software used.

Different types of metadata

A distinction is often made between the following types of metadata. This is not an exhaustive list, but can be a starting point for thinking about which metadata should be included for the dataset in question:

  • Descriptive - e.g. name of author(s)/researcher(s), title of the document, etc.
  • Administrative - e.g. creation date, versioning, licenses
  • Structural - e.g. relationship between files, meaning of variables
  • Technical - e.g. information about format and any software or hardware

In addition, it can be useful to consider different levels of metadata. At the project level, metadata should describe the overall context of data collection, who carried out the work and how the project was funded. A description of the dataset, keywords and license also belong at the overall level. Metadata at dataset level is more specific and detailed. Examples include: information about file types, which measuring instruments were used and a description of variables and units.

Metadata standards

In order for metadata to be machine-readable, the metadata should be as structured and standardized as possible. We recommend that you use standardized terms, taxonomies/ontologies and vocabularies that are available within your field of expertise.
There are a number of metadata standards. Some are generic and can be used for all disciplines, while others are adapted to specific subjects and disciplines.
Dublin Core is a generic metadata standard that consists of a list of elements to describe a dataset or other digital object. Many open data archives use Dublin Core as a generic metadata standard. There is also a separate variant of Dublin Core for biodiversity data, Darwin Core. In the social sciences, DDI (Document Documentation Initiative) is often used. Overviews of different standards can be found at Research Data Alliance, FAIRSharing.org and Digital Curation Centre.

ReadMe file

A good way to make relevant additional information about a dataset available is a so-called ReadMe file. The ReadMe file is intended to ensure that the data can be understood by yourself at a later date, or by others when the dataset is shared and published.

It is recommended to create the ReadMe file at an early stage and let it accompany the dataset. Much of the content of a ReadMe file will overlap with metadata information, but the ReadMe file can also contain detailed method descriptions, as well as provide an overview of the files and an explanation of the files' content.

The ReadMe file should be in plain text (.txt).

NTNU's institutional repository in DataverseNO has the following minimum requirements for the content of the ReadMe file:

  • Title of the dataset, DOI, contact information
  • Methodology
  • Data and file overview
  • File-specific information
  • Terms of reuse

DataverseNO also has a general template for ReadMe file that can be downloaded. For datasets that only contain software code or code-based data, this template can be used.

Examples of other relevant documentation:

  • Descriptions, instructions and protocols
  • Configuration files and log files
  • Glossaries, code books
  • Variable lists
  • Information letters and consent forms
  • Notification form and pre-assessment from Sikt, any ethical approvals
  • Questionnaire and interview guide
  • Permits and licenses from any rights holders

See also: Making a research project understandable: Guide for data documentation (Siiri Fuchs og Mari Elisa Kuusniemi 2018)

Contact