
Well-organised and well-documented data ensures data is aligned to FAIR principles of data integrity: Findable, Accessible, Interoperable, and Reusable and enables CARE: Collective Benefit, Authority to Control, Responsibility, and Ethics.
Start early: use a Data Management Plan (DMP) to outline how you plan to organize and document your data and update it throughout your research project.
Dedicate time throughout your research project to stay organized and update data documentation
Organise data in line with any discipline specific best practices (e.g. Systematic Review)
There is no single right way to organise data. Choose a method or mix of methods that work best for you and for your discipline and be consistent with those methods.
Organise your data by:
Creating a file structure system and file and document naming conventions
Versioning your data
Documenting your research data provides context of your data and how it supports your research findings. You should document your data throughout your research process. Certain data documentation may be required when publishing your research.
Common forms of data documentation:
A Data Management Plan (DMP)
Metadata fields
READMEs
Data Availability Statement for published datasets
File Structures & Naming ConventionsFiles can be organized according to:
Structure files in a hierarchal system, using either a deep or shallow hierarchy, using:
Choose a file and document naming conventions and be consistent with them
Image source: L. Bishop et. Al (2014) Managing and Sharing Research Data – A Guide to Good Practice; p 69
Keeping track of dataset versions makes datasets trustworthy. A given dataset may be in use by one or several data contributors or data consumers, so it is vital to make apparent which is the latest version of the data and a version history that explains changes made to prior versions.
There is no data community consensus on when changes to a dataset cause it to become a different dataset instead of a new version and there is no single right way to version data and track changes. Follow your discipline's best practices.
For more information on tools for version control, see our Digital Tools for Research Guide.
A data management plan (DMP) is a document used to describe how you plan to manage your data throughout your research project. Many funders require that a DMP be created, maintained, and submitted.
A DMP should include:
Information about data and data format
Metadata content and format
Policies for access, sharing and re-use
Long-term storage and data management
Budget
For more information on DMPs, refer to our Data Management Plans Guide.
Metadata provide standardized, structured information that describes a data collection in a machine-readable way. It enables data discovery, reproducibility, and preservation. A key component of metadata is a metadata schema which provides the set of standards of common components of metadata, such as identifier, creator, and date.
There are different metadata standards that you can apply to your research depending on your discipline, the type of data you are working with, and the location where you plan to publish and preserve your data. Metadata is often categorised into three categories:
Descriptive metadata includes author, title, keywords and abstract and enable users to find resources online.
Administrative metadata includes information about when and how a resource was created as well as file type, technical information and access rights. It includes technical metadata, rights management metadata, and preservation metadata
Structural metadata provides information about the relationship between components in an object e.g. relating articles, issues and volumes of serial publications, or the pages and chapters of a book.
A metadata standard provides a structure to describe data with:
Common Metadata Standards for Research Data
| Metadata Standard | Description |
|---|---|
| Biosharing | Biosharing is an educational resource on inter-related data standards, databases and policies in the life, environmental and biomedical sciences. |
| CEDAR metadata tools | CEDAR (Center for Expanded Data Annotation) is a repository of community defined metadata templates. Its goal is to improve metadata and its use in the biomedical sciences. They be used to create, annotate, analyze, validate and search metadata based on the fields and relations defined in the metadata templates. |
| Darwin Core | Darwin Core is a standard maintained by the Darwin Core Maintenance Group. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing identifiers, labels, and definitions. Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, samples, and related information. |
| DataCite | The DataCite Metadata Schema is a list of core metadata properties chosen for the accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions. The resource that is being identified can be of any kind, but it is typically a dataset. |
| DCC Disciplinary Metadata | The Digital Curation Centre provides links to information about discipline specific metadata standards, including profiles, tools to implement the standards, and use cases of data repositories currently implementing them. |
| DDI | The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioural, economic, and health sciences. DDI is a free standard that can be used to document and manage different stages in the research data lifecycle, such as conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use by people, software systems, and computer networks. |
| Fairsharing.org | A good place to start to find metadata standards for your discipline. |
| Research Data Alliance Standards Directory | The Research Data Alliance Standards Directory contains widely used metadata standards in the Arts and Humanities, Engineering, Life Sciences, Physical Sciences and Mathematics, Social and behavioural Sciences and General Research Data. |
Sources: Coffey, A. M., Joy, C., Clarke, C. R., Hayes, A., McCarney, E., Madden, F., Quinn, C., Dalton, M., Stokes, D., McCabe, G., Noonan, E., & O'Dwyer, L. (2025). Navigating Open Research - A guide for early career researchers. Zenodo.
If publishing research data to the University of Galway Community on Zenodo, DataCite is the metadata standard required for deposit into the Zenodo repository (see Guide to Publishing Research Data for more information on research data publishing).
"While DataCite’s Metadata Schema has been expanded with each new version [it is] intended to be generic to the broadest range of research datasets, rather than customized to the needs of any particular discipline. DataCite metadata primarily supports citation and discovery of data; it is not intended to supplant or replace the discipline or community specific metadata that fully describes the data, and that is vital for understanding and reuse."
Mandatory metadata properties
|
ID |
Property |
Obligation |
|---|---|---|
|
1 |
M |
|
|
M |
||
|
3 |
M |
|
|
4 |
M |
|
|
5 |
M |
|
|
10 |
M |
Optional and recommended metadata properties
|
ID |
Property |
Obligation |
|---|---|---|
|
6 |
R |
|
|
7 |
R |
|
|
8 |
R |
|
|
9 |
O |
|
|
11 |
O |
|
|
12 |
R |
|
|
13 |
O |
|
|
14 |
O |
|
|
15 |
O |
|
|
16 |
O |
|
|
17 |
R |
|
|
18 |
R |
|
|
19 |
O |
|
|
20 |
O |
Sources: https://datacite-metadata-schema.readthedocs.io/en/4.6/properties/overview/#mandatory-properties; https://datacite-metadata-schema.readthedocs.io/en/4.5/introduction/about-schema/
Technical metadata will be specific to your discipline and your research project. It can describe what a variable is in your dataset and how it was collected, e.g. research instrument settings, software versions, patient IDs, image resolutions, etc. These metadata fields are vital for data processing and for data reusability. When collecting data and adding metadata fields to your research data, ask yourself the following questions:
Could someone not familiar with my data understand it?
Does my data have a unique identifier for each data record (i.e. row of data)?
If someone wanted to recreate my research, does my data include all the information they would need to do so?
Do the column names in my dataset accurately describe the variables in that column?

A README file is a simple way to direct consumers of your data to your data documentation. A README file includes details about your data such as:
When creating a README file:
Source: https://data.research.cornell.edu/data-management/sharing/readme/
A data availability statement describes the datasets associated with your published work and includes a link to the relevant repository and the dataset’s persistent identifier. It can be added to the end of your published work, before the reference list. Note that a Data Availability Statement may be required for some open access publishers. See here for examples of data availability statements.
The Library proactively supports and enhances the learning, teaching, and research activities of the University. The Library acts as a catalyst for your success as University of Galway’s hub for scholarly information discovery, sharing, and publication.
Library
University of Galway
University Road,
Galway, Ireland
T. +353 91 493399