Skip to content

Best Practices for Metadata

What is Metadata?

Metadata is data about data. It provides the context that makes research data understandable and reusable.
It describes the who, what, where, when, how, and why of data collection.

Metadata also informs users about:

  • Constraints - limitations on use or sharing
  • Update frequency - how often data is refreshed
  • Interoperability - standardized terms that make data discoverable across repositories

Metadata CanWIN Collects

CanWIN Metadata

  • Title
  • Summary
  • Location
  • Date
  • Authors and affiliations
  • Keywords
  • Licensing information and terms of use/access
  • Data status and versions
  • Update and maintenance frequency
  • Data Type (dataset, report, model etc)
  • Sample and analytical methods (steps or methods to collect and process data)
  • Instruments & deployment details (instrument type, sensors, deployment dates and locations, etc)
  • Related resources
  • Awards & Funding information
  • Website
  • Theme (marine, atmospheric, freshwater, cryosphere, remote sensing)
  • Variable descriptions (names, units, media)

Tip

Start metadata collection early in your research.
Use consistent tools (e.g., lab notebooks, GitLab README files, or metadata templates) to avoid gaps later.


Best Practices for Metadata Submitters

  • Write a meaningful dataset description - Ensure your description is:
  • More than 50 characters
  • Written in plain language (accessible to non‑specialists)
  • If possible, includes scope, purpose, and context (what the data represents, why it was collected)
  • Avoids jargon or unexplained acronyms
  • Use precise keywords - 4-6 well‑chosen terms improve findability.
  • Include provenance - Who collected the data, when, and where.
  • Record data collection & processing steps - Document methods for transparency and reuse.
  • Create a data dictionary - Define variables (units and descriptions).
  • Keep metadata current - Update when methods, instruments, or contributors change.
  • Respect Indigenous Data Sovereignty - Apply CARE and OCAP principles when relevant.
  • Provide licensing & access rights - Specify Creative Commons or institutional licenses.

Data Dictionary, Codebooks, and Cookbooks

Beyond metadata fields, three documentation tools strengthen the understandability and reproducibility of your datasets:


Data Dictionary

A data dictionary defines the terms in your data files and applies common names to variables so that your data is understandable to others.
It should include variable names, units, and clear descriptions.

Example structure:

Variable name Common name Units Description
T_C Temperature °C Water temperature measured at depth
Salinity Salinity PSU Practical salinity units
O2_mgL Dissolved Oxygen mg/L Oxygen concentration in water

Cookbook

A cookbook describes the data retrieval and processing steps in a workflow, step by step.
It’s essentially a recipe for reproducing your dataset preparation.


Codebook

A codebook describes the key functions, modules, or scripts used to process the data.


Tip

Think of these three tools as complementary:

  • Data Dictionary → defines your variables
  • Codebook → explains your scripts
  • Cookbook → documents your workflow
    Together, they make your dataset FAIR and reproducible.

References & Extra Sources