Best Practices for Metadata
What is Metadata?
Metadata is data about data. It provides the context that makes research data understandable and reusable.
It describes the who, what, where, when, how, and why of data collection.

Metadata also informs users about:
- Constraints - limitations on use or sharing
- Update frequency - how often data is refreshed
- Interoperability - standardized terms that make data discoverable across repositories
Metadata CanWIN Collects
CanWIN Metadata
- Title
- Summary
- Location
- Date
- Authors and affiliations
- Keywords
- Licensing information and terms of use/access
- Data status and versions
- Update and maintenance frequency
- Data Type (dataset, report, model etc)
- Sample and analytical methods (steps or methods to collect and process data)
- Instruments & deployment details (instrument type, sensors, deployment dates and locations, etc)
- Related resources
- Awards & Funding information
- Website
- Theme (marine, atmospheric, freshwater, cryosphere, remote sensing)
- Variable descriptions (names, units, media)
Tip
Start metadata collection early in your research.
Use consistent tools (e.g., lab notebooks, GitLab README files, or metadata templates) to avoid gaps later.
Best Practices for Metadata Submitters
- Write a meaningful dataset description - Ensure your description is:
- More than 50 characters
- Written in plain language (accessible to non‑specialists)
- If possible, includes scope, purpose, and context (what the data represents, why it was collected)
- Avoids jargon or unexplained acronyms
- Use precise keywords - 4-6 well‑chosen terms improve findability.
- Include provenance - Who collected the data, when, and where.
- Record data collection & processing steps - Document methods for transparency and reuse.
- Create a data dictionary - Define variables (units and descriptions).
- Keep metadata current - Update when methods, instruments, or contributors change.
- Respect Indigenous Data Sovereignty - Apply CARE and OCAP principles when relevant.
- Provide licensing & access rights - Specify Creative Commons or institutional licenses.
Data Dictionary, Codebooks, and Cookbooks
Beyond metadata fields, three documentation tools strengthen the understandability and reproducibility of your datasets:
Data Dictionary
A data dictionary defines the terms in your data files and applies common names to variables so that your data is understandable to others.
It should include variable names, units, and clear descriptions.
Example structure:
| Variable name | Common name | Units | Description |
|---|---|---|---|
| T_C | Temperature | °C | Water temperature measured at depth |
| Salinity | Salinity | PSU | Practical salinity units |
| O2_mgL | Dissolved Oxygen | mg/L | Oxygen concentration in water |
Cookbook
A cookbook describes the data retrieval and processing steps in a workflow, step by step.
It’s essentially a recipe for reproducing your dataset preparation.
Codebook
A codebook describes the key functions, modules, or scripts used to process the data.
Tip
Think of these three tools as complementary:
- Data Dictionary → defines your variables
- Codebook → explains your scripts
- Cookbook → documents your workflow
Together, they make your dataset FAIR and reproducible.