Skip to Main Content

Research Data Management

Library curated guides for every stage of the research data management lifecycle

Organising & Documenting Research Data

Organising & Documenting Research Data

Overview of Best Practices

Well-organised and well-documented data ensures data is aligned to FAIR principles of data integrity: Findable, Accessible, Interoperable, and Reusable and enables CARE: Collective Benefit, Authority to Control, Responsibility, and Ethics.
 

  • Start early: use a Data Management Plan (DMP) to outline how you plan to organize and document your data and update it throughout your research project.

  • Dedicate time throughout your research project to stay organized and update data documentation

  • Organise data in line with any discipline specific best practices (e.g. Systematic Review)

There is no single right way to organise data. Choose a method or mix of methods that work best for you and for your discipline and be consistent with those methods. 

Organise your data by:

  • Creating a file structure system and file and document naming conventions

  • Versioning your data

Documenting your research data provides context of your data and how it supports your research findings. You should document your data throughout your research process. Certain data documentation may be required when publishing your research. 

Common forms of data documentation:

  • A Data Management Plan (DMP)

  • Metadata fields

  • READMEs

  • Data Availability Statement for published datasets

File Structures & Naming Conventions

Files can be organized according to:

  • Research activity (interviews, surveys, etc.)
  • Data type (images, text, databases)
  • Kind of material (raw data, cleaned data, published data)

Structure files in a hierarchal system, using either a deep or shallow hierarchy, using:

  • The Single-Question Principle: At each level of your hierarchy, strive to make all folder names answer the same question.
  • The Separation Principle: Whenever possible, limit each folder to containing only files or only other folders.
  • The Domain Principle: Organize files in different domains differently

Choose a file and document naming conventions and be consistent with them

  • Choose camelCase or snake_case which makes working with data files in CLIs or IDEs easier
  • Consistent date formatting (e.g. YYYYDDMM)

 

 

Image source:  L. Bishop et. Al (2014) Managing and Sharing Research Data – A Guide to Good Practice;  p 69

Data Versioning

Keeping track of dataset versions makes datasets trustworthy. A given dataset may be in use by one or several data contributors or data consumers, so it is vital to make apparent which is the latest version of the data and a version history that explains changes made to prior versions.

  • Simple methods such as using folder structures and naming conventions to version data
  • A more powerful and collaborative tool for data versioning, is open source version control systems such as Git and DVC

 

There is no data community consensus on when changes to a dataset cause it to become a different dataset instead of a new version and there is no single right way to version data and track changes. Follow your discipline's best practices.

For more information on tools for version control, see our Digital Tools for Research Guide.

Data Management Plan (DMP) 

 

A data management plan (DMP) is a document used to describe how you plan to manage your data throughout your research project. Many funders require that a DMP be created, maintained, and submitted.
A DMP should include:

  • Information about data and data format

  • Metadata content and format

  • Policies for access, sharing and re-use

  • Long-term storage and data management

  • Budget

For more information on DMPs, refer to our Data Management Plans Guide.

Metadata

Metadata provide standardized, structured information that describes a data collection in a machine-readable way. It enables data discovery, reproducibility, and preservation. A key component of metadata is a metadata schema which provides the set of standards of common components of metadata, such as identifier, creator, and date.

 

There are different metadata standards that you can apply to your research depending on your discipline, the type of data you are working with, and the location where you plan to publish and preserve your data. Metadata is often categorised into three categories:

  • Descriptive metadata includes author, title, keywords and abstract and enable users to find resources online.

  • Administrative metadata includes information about when and how a resource was created as well as file type, technical information and access rights. It includes technical metadata, rights management metadata, and preservation metadata

  • Structural metadata provides information about the relationship between components in an object e.g. relating articles, issues and volumes of serial publications, or the pages and chapters of a book.

A metadata standard provides a structure to describe data with:

  • Common terms for consistency between records
  • Common definitions for easier interpretation
  • Common language for ease of communication
  • Common structure to quickly locate information
  • Standards provide a uniform summary description of a dataset. 

 

Common Metadata Standards for Research Data

Metadata Standard  Description
Biosharing Biosharing is an educational resource on inter-related data standards, databases and policies in the life, environmental and biomedical sciences. 
CEDAR metadata tools CEDAR (Center for Expanded Data Annotation) is a repository of community defined metadata templates. Its goal is to improve metadata and its use in the biomedical sciences. They be used to create, annotate, analyze, validate and search metadata based on the fields and relations defined in the metadata templates. 
Darwin Core Darwin Core is a standard maintained by the Darwin Core Maintenance Group. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing identifiers, labels, and definitions. Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, samples, and related information.
DataCite The DataCite Metadata Schema is a list of core metadata properties chosen for the accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions. The resource that is being identified can be of any kind, but it is typically a dataset.
DCC Disciplinary Metadata The Digital Curation Centre provides links to information about discipline specific metadata standards, including profiles, tools to implement the standards, and use cases of data repositories currently implementing them. 
DDI The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioural, economic, and health sciences. DDI is a free standard that can be used to document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use by people, software systems, and computer networks. 
Fairsharing.org A good place to start to find metadata standards for your discipline.
Research Data Alliance Standards Directory  The Research Data Alliance Standards Directory contains widely used metadata standards in the Arts and Humanities, Engineering, Life Sciences, Physical Sciences and Mathematics, Social and behavioural Sciences and General Research Data.

Sources: Coffey, A. M., Joy, C., Clarke, C. R., Hayes, A., McCarney, E., Madden, F., Quinn, C., Dalton, M., Stokes, D., McCabe, G., Noonan, E., & O'Dwyer, L. (2025). Navigating Open Research - A guide for early career researchers. Zenodo.

DataCite Metadata

If publishing research data to the University of Galway Community on Zenodo, DataCite is the metadata standard required for deposit into the Zenodo repository (see Guide to Publishing Research Data for more information on research data publishing).

"While DataCite’s Metadata Schema has been expanded with each new version [it is] intended to be generic to the broadest range of research datasets, rather than customized to the needs of any particular discipline. DataCite metadata primarily supports citation and discovery of data; it is not intended to supplant or replace the discipline or community specific metadata that fully describes the data, and that is vital for understanding and reuse."
 

Mandatory metadata properties

ID

Property

Obligation

1      

Identifier

M

Creator

M

 

3

Title

M

4

Publisher

M

5

PublicationYear

M

10

ResourceType

M

Optional and recommended metadata properties

ID

Property

Obligation

6

Subject

R

7

Contributor

R

8

Date

R

9

Language

O

11

AlternateIdentifier

O

12

RelatedIdentifier

R

13

Size

O

14

Format

O

15

Version

O

16

Rights

O

17

Description

R

18

GeoLocation

R

19

FundingReference

O

20      

RelatedItem

O


Sources: https://datacite-metadata-schema.readthedocs.io/en/4.6/properties/overview/#mandatory-properties; https://datacite-metadata-schema.readthedocs.io/en/4.5/introduction/about-schema/

Technical Metadata

Technical metadata will be specific to your discipline and your research project. It can describe what a variable is in your dataset and how it was collected, e.g. research instrument settings, software versions, patient IDs, image resolutions, etc. These metadata fields are vital for data processing and for data reusability. When collecting data and adding metadata fields to your research data, ask yourself the following questions:

  • Could someone not familiar with my data understand it?

  • Does my data have a unique identifier for each data record (i.e. row of data)?

  • If someone wanted to recreate my research, does my data include all the information they would need to do so?

  • Do the column names in my dataset accurately describe the variables in that column?

README

A README file is a simple way to direct consumers of your data to your data documentation. A README file includes details about your data such as:

  • Description of the data
  • The research project it is associated with
  • The version of the data and when the data was last updated
  • What licensing is attributed to the dataset (for published datasets)

 

When creating a README file:

  • Create for logical clusters of related files or data
  • Write as a plain text file (.txt) avoiding proprietary file types (e.g. .docx)
  • Use consistent formatting for all README files

 

Source: https://data.research.cornell.edu/data-management/sharing/readme/

Data Availability Statement

A data availability statement describes the datasets associated with your published work and includes a link to the relevant repository and the dataset’s persistent identifier. It can be added to the end of your published work, before the reference list. Note that a Data Availability Statement may be required for some open access publishers.  See here for examples of data availability statements.