Skip to Main Content

Research Data Management

Data Organization

A clear, logical structure helps you and others find and understand your data easily. Consistency is key for maintaining order as your dataset grows.


File Naming

‣ Elements to Consider in Your File Naming Scheme
  • Project name/acronym

  • File status

  • Date (e.g. YYYYMMMDD)

  • Short description of content

  • Data type information

  • Creator name/initials

  • Version number

‣ File name best practices:
  • Establish an 'order of elements', eg. YukonClimate_20240103_TempertureData_v3.csv (ProjectName_Date_Description_Version#)

  • Use date in ISO 8601 format YYYYMMDD 

  • Avoid spaces or special characters (e.g. ! @ $ % * () ‘;<>,[]{}”); use underscores (e.g. file_name)  or camel case (e.g. FileName)

  • Make your file names less than 30 characters

  • Include version control elements: version numbers (e.g., v1, v2) or dates

  • When sequentially numbering files, use leading zeros in order to guarantee that files will sort properly; e.g. 0001, 0002 … 1001 vs. 1,2, … 1001

It is also a good idea to design a "README.TXT" file that explains your naming convention and abbreviations.


File Formats

Try to select non-proprietary and uncompressed formats for the purpose of long-term storage and management. Here below are some preferred file formats.

  • Text: XML, TXT, PDF/A, HTML, ASCII, UTF-8 (not Word)

  • Tabular Data: CSV (not Excel)

  • Still Images: TIFF, JPEG 2000, PDF, PNG, BMP (not GIF or JPG)

  • Moving Images: MOV, MPEG, AVI, MXF (not Quicktime)

  • Sounds: WAVE, AIFF, MP3, MXF

  • Databases: XML, CSV

  • Statistics: ASCII, DTA, POR, SAS, SAV

  • Containers: TAR, GZIP, ZIP

  • Geospatial: SHP, DBF, GeoTIFF, NetCDF

  • Web Archive: WARC

Data Documentation

Data documentation is essential for ensuring that your research data is discoverable, understandable, reusable, and reproducible. 

Common documentation files:

  • README File - A README file provides a quick overview of your project. It answers basic questions about the data’s purpose, origin, structure, and how to use it, making it the first place users should look to understand your data.
  • Data Dictionary - A data dictionary is a detailed list of the variables in your dataset, explaining what each variable represents, its data type (e.g., number, text), and any relevant units or coding schemes. It ensures that users can interpret your dataset accurately.
  • Codebook - A codebook is a comprehensive guide that describes the coding scheme used for a dataset. It provides essential information about each variable, including descriptions, values, units of measurement, etc. 
  • Laboratory Notebook - A laboratory notebook records the day-to-day process of your research, including experiment dates, methods, changes, and observations. This helps users understand how the data was collected and any modifications that may impact results.
  • Metadata - Metadata is information that describes data, such as the title, creator, keywords, and license information. It makes your data more searchable, discoverable, and easier to cite in other research.

Organizing Data Variables with READMEs

README File

A README file is a plain text file that includes descriptive information and is commonly used for software, games, and code. It is a supplementary document that exists so the creator can explain the contents to the user. When working with data, it can be useful to create and include a READMe file with your data. This ensures that future users will understand the data, any terms, and more.

What Should a README Include?

There are no standards for writing a READMe text file, but it is recommended to include:

  • Title

  • Principle Investigator(s)

  • Dates/Locations of data collection

  • Keywords

  • Language

  • Funding

  • Descriptions of every folder, file, format, data collection method, instruments,etc.

  • Definitions

  • People involved

  • Recommended citation

If creating a READMe file for a dataset, be mindful of the following:

  • Abbreviations and acronyms are defined

  • Variables/parameters and units are described

  • Data treatment and methodology are described

  • Headings are explained

  • Known limits of the data/problems are mentioned (this should include explanations of missing data, negative values (for parameters this is not expected for), no reads are explained, etc.)

  • Reference to papers describing methodology, if applicable

  • Related datasets are properly cited

How to Write a README File?

Metadata & Describing Data

Metadata

Metadata is information about data. It provides the context and details by addressing the who, what, when, where, why, and how of the dataset, making it easier to find, access, and use. Good metadata aligns with the FAIR data principles, ensuring data is Findable, Accessible, Interoperable, and Reusable.

Key Metadata Elements

  • Title: full title by which the dataset is known

  • Creator: the name(s) of the person or organization responsible for creating the work

  • Contact Information: name and email address for the main contact for the dataset

  • Description: summary of purpose, nature, and scope of the dataset

  • Subject: broad domain-specific subject category

  • Date(s): consider including collection date, production date, deposit date, distribution date, publication date, etc.

  • Keywords: relevant terms that help users search for the dataset

  • Location: for geospatial data 

  • Licensing Information: details on how the data can be used, such as a Creative Commons license.

  • Funding or granting agency

Choosing a Metadata Standard

Metadata standards or schemas consist of specific elements used to describe or document your data. Many data repositories, disciplines and organizations have established specific metadata standards.

To find an appropriate metadata standard for your data:

Examples of some commonly used ones:

  • Dublin Core - a basic, widely-used standard and domain-agnostic metadata standard

  • DDI (Data Documentation Inititative) - commonly used in social, behavioral, economic, and health sciences

  • EML (Ecological Metadata Language) - specific for ecology disciplines

  • ISO 19115 - a standard for geographic information