Best practices for Metadata¶
What is Metadata?¶
Metadata is said to be the data about your actual data. It aids in giving context to research data, so that it can easily be understood or reused. Metadata describes the who, what, where, when, how, and why the data was collected. It also informs users on any limitations to the use of the data and how often the data is updated. Metadata becomes imperative when transitioning datasets to online platforms for publishing or storing. Employing standardized metadata terminology supports data being interoperable across different systems and repositories, which enhances data discoverability and reusability.
Some of our core metadata fields are:
Metadata standards (or schemas) are rules established to define and structure metadata elements for a particular area of research. Utilizing defined and established terms (controlled vocabularies) to describe your data will ensure data consistency and integrity when the data is publicly shared. These standards also facilitate interoperability, so that information can be exchanged and read across different online platforms. CanWIN has its owned controlled vocabulary lists, which integrate several key schemas. CanWIN follows the ESIP Science on Schema.org and DataCite 4.4 Metadata Schema standards. To maximize discoverability of your data, CanWIN works with CCADI to give users the ability to convert (cross-walk) metadata of interest to additional standards (ex. Dublin Core, ISO 19115-3:2003), facilitating interoperability and therefore re-use of your data by other systems. These standardized terms are reflected in our metadata templates, which collect metadata from users before data is uploaded to CanWIN's Data Catalogue.
Keywords are specific words used to describe your data/information, which allows for that data/information to be searched in the CanWIN Data Catalogue as well as via any search engine on the internet. Below are the keyword vocabularies that are used by CanWIN.
|PDC||Polar Data Catalogue (PDC) specializes in words to describe data from the polar regions.|
|GCMD||Global Change Master Directory (GCMD) uses terms to describe datasets within the Earth Sciences.|
|GOC||Government of Canada (GOC) has a controlled vocabulary for describing datasets available through open gov.|
Metadata Best Practices¶
Start metadata collection early. The more metadata, the better!
Metadata is required when the data is ready to be published to an online repository, however, metadata collection should begin as soon as you have a project and continue until data is ready to be shared. This prevents data collectors/providers from forgetting key metadata and allows them to provide complete contextual information for their data.
Determine the best resource that will encourage consistent documentation, whether it be a log book, a note app, a REDAME file in GitLab, etc.
As you collect and store your data, identify the key areas of metadata that you should document. For example:
- The context of data (why and how data was collected)
- The structure of data (including how multiple files relate to each other)
- Quality assurance that data is complete and uncorrupted
- Information on data confidentiality, access, and use conditions (where applicable)
- Identification and tracking of different versions of datasets
Choose the most appropriate keywords that describe your data. Good keywords improve the findability of your data. However, too many can be ineffective. CanWIN recommends 4 to 6 keywords.
Ensure that all your metadata at the time of data publishing is accurate and up-to-date. This means keeping track of your metadata throughout your research. For example, if there are changes to the analytical method used, ensure that these changes are captured.
To get an idea of the kinds of metadata CanWIN requires before publishing your data, you can take a look at our online metadata forms. These forms can inform your documentation process as you can go along, to ensure that you are collecting as much applicable metadata as possible. Note that not all fields might be relevant to your research data or data collection process.
|Campaign, deployment, instrument, platform metadata||https://form.jotform.com/222166104125241|