Skip to content

Code sharing on Kaggle

Kaggle is a cloud environment for sharing code in python and R. Kaggle is built on a Jupyter Notebook infrastructure, however, code can be shared as either notebooks or plain scripts. The code shared by CanWIN is typically useful for data manipulation in the cleaning stage.

Two kinds of notebooks are shared:

  1. Scripts

    These are simply source scripts in python or R, which will require editing by the user (such as editing the value of specific variables) so that the script can be tailored to the user's data. They can simply be downloaded directly from Kaggle. They are suited to persons who are familiar with coding and running scripts on their own computer.

  2. Interactive notebooks

    These notebooks are for users who wish to process their data in the cloud without having to download any scripts, packages, or software. These notebooks incorporate interactive widgets so that a user can easily make selections or enter information without having to edit the source script. Users can upload their data files directly to Kaggle, run the notebook, and downland their cleaned files. No muss, no fuss.

Accessing Kaggle

If you are interested in using or modifying a script, contact a data curator for the CanWIN Kaggle credentials: portalco@umanitoba.ca.

Once you have received the credentials and logged in, click the button below to be taken to a list of shared notebooks. For a quick guide to finding and running these shared notebooks, go to the Using interactive notebooks page.

Go to CanWIN's Kaggle


Note

For data curators, our internal Kaggle scripts can be found here.


Interactive Notebooks

Interactive Notebook Description
Basic file cleaning Reorder columns, rename columns, add columns, delete columns, and merge multiple files into one.
Decimal or Jullian day to date Converts a column with decimal days fo the year (also called Jullian days in some cases) to UTC date.
Seconds since reference time or epoch Converts a column with seconds since a reference time to UTC time. Some GPS sensors records time in this format, for example.
Thermosalinograph (TSG) CNV to CSV conversion Converts TSG CNV files to CSV
Provincial Chemistry data processing This editor merges one row of units and one row of VMV codes with column headers, producing a file with one cleaned header row without spaces/special characters.
Date-Time conversion Converting date-time column values into a different date format.
Date-Time merge Merging separate columns of dates and times into one date-time column.
ERDDAP Metadata profile generation This script generates a partial ERDDAP metadata profile to be added to the Datasetes.XML file. It grabs applicable metadata from the dataset’s CKAN page.
Castaway CTD processing This script was created to process Castaway CTD files to be ingested into the ODV software.