Need assistance with Research Data Management or interested in learning more?
Reach out to us at
Jia Wu, Metadata/Systems Librarian: jwu@yukonu.ca
Anna Krangle-Long, Grant Facilitator and Research Engagement Coordinator:
A clear, logical structure helps you and others find and understand your data easily. Consistency is key for maintaining order as your dataset grows.
Project name/acronym
File status
Date (e.g. YYYYMMMDD)
Short description of content
Data type information
Creator name/initials
Version number
Establish an 'order of elements', eg. YukonClimate_20240103_TempertureData_v3.csv (ProjectName_Date_Description_Version#)
Use date in ISO 8601 format YYYYMMDD
Avoid spaces or special characters (e.g. ! @ $ % * () ‘;<>,[]{}”); use underscores (e.g. file_name) or camel case (e.g. FileName)
Make your file names less than 30 characters
Include version control elements: version numbers (e.g., v1, v2) or dates
When sequentially numbering files, use leading zeros in order to guarantee that files will sort properly; e.g. 0001, 0002 … 1001 vs. 1,2, … 1001
It is also a good idea to design a "README.TXT" file that explains your naming convention and abbreviations.
Try to select non-proprietary and uncompressed formats for the purpose of long-term storage and management. Here below are some preferred file formats.
Text: XML, TXT, PDF/A, HTML, ASCII, UTF-8 (not Word)
Tabular Data: CSV (not Excel)
Still Images: TIFF, JPEG 2000, PDF, PNG, BMP (not GIF or JPG)
Moving Images: MOV, MPEG, AVI, MXF (not Quicktime)
Sounds: WAVE, AIFF, MP3, MXF
Databases: XML, CSV
Statistics: ASCII, DTA, POR, SAS, SAV
Containers: TAR, GZIP, ZIP
Geospatial: SHP, DBF, GeoTIFF, NetCDF
Web Archive: WARC
Digital File and Folder Management (Thompson Rivers University)
Naming and organizing your files and folders worksheet (MIT)
Sustainability of Digital Formats (Library of Congress)
Data documentation is essential for ensuring that your research data is discoverable, understandable, reusable, and reproducible.
Common documentation files:
A README file is a plain text file that includes descriptive information and is commonly used for software, games, and code. It is a supplementary document that exists so the creator can explain the contents to the user. When working with data, it can be useful to create and include a READMe file with your data. This ensures that future users will understand the data, any terms, and more.
There are no standards for writing a READMe text file, but it is recommended to include:
Title
Principle Investigator(s)
Dates/Locations of data collection
Keywords
Language
Funding
Descriptions of every folder, file, format, data collection method, instruments,etc.
Definitions
People involved
Recommended citation
If creating a READMe file for a dataset, be mindful of the following:
Abbreviations and acronyms are defined
Variables/parameters and units are described
Data treatment and methodology are described
Headings are explained
Known limits of the data/problems are mentioned (this should include explanations of missing data, negative values (for parameters this is not expected for), no reads are explained, etc.)
Reference to papers describing methodology, if applicable
Related datasets are properly cited
Guide to writing "readme" style metadata (Cornell University)
A template README for social science replication packages (Social Science Data Editors )
Metadata is information about data. It provides the context and details by addressing the who, what, when, where, why, and how of the dataset, making it easier to find, access, and use. Good metadata aligns with the FAIR data principles, ensuring data is Findable, Accessible, Interoperable, and Reusable.
Title: full title by which the dataset is known
Creator: the name(s) of the person or organization responsible for creating the work
Contact Information: name and email address for the main contact for the dataset
Description: summary of purpose, nature, and scope of the dataset
Subject: broad domain-specific subject category
Date(s): consider including collection date, production date, deposit date, distribution date, publication date, etc.
Keywords: relevant terms that help users search for the dataset
Location: for geospatial data
Licensing Information: details on how the data can be used, such as a Creative Commons license.
Funding or granting agency
Metadata standards or schemas consist of specific elements used to describe or document your data. Many data repositories, disciplines and organizations have established specific metadata standards.
To find an appropriate metadata standard for your data:
Disciplinary Metadata (Digital Curation Centre)
The RDA Metadata Standards Catalog (Research Data Alliance for the international academic community)
The RDA Metadata Standards Directory (Research Data Alliance)
Examples of some commonly used ones:
Dublin Core - a basic, widely-used standard and domain-agnostic metadata standard
DDI (Data Documentation Inititative) - commonly used in social, behavioral, economic, and health sciences
EML (Ecological Metadata Language) - specific for ecology disciplines
ISO 19115 - a standard for geographic information