API requests using R¶

The scripts below demonstrate a few ways to download a CSV into your R environment or directly onto your hard drive using the ckanr library. Only CSV, XLS, xlsx, XML, HTML, JSON, SHP, GeoJSON, and TXT files are downloadable using this script. See pages 15-17 of the documentation for additional details on fetching the other data types.

In CanWIN we allow users to download a dataset in a compressed folder (.zip); however, these are not downloadable using R. Refer to the Downloading Data help page for information on how to download data packages.

Loading Library and Working Environment¶

library(tidyverse)
library(ckanr)

In order for your code to extract data from the CanWIN site, rather than the default site within the code, you will need to set the code to the correct server. There are different servers that CKAN offers for extracting data, so it is important to ensure you are directing your script to the correct server.

ckanr_setup("https://canwin-datahub.ad.umanitoba.ca/data") 
get_default_url() #prints in console the URL you are querying
servers() #provides list of ckan servers that you can use to access data

Viewing Data Categories¶

In ckanr, themes, datasets, and keywords are known as groups, packages, and tags. View them using the following code:

group_list(as = "table")   # Lists the themes
package_list(as = "table")   # Lists the datasets
tag_list(as = "table")   # Lists the keyword

Importing a Dataset File into RStudio¶

The majority of the data resources on CanWIN are CSV files. Importing a CSV into R requires you to know the resource ID of the dataset you want to access, which can be found in the metadata section at the bottom of a dataset page.

Here we present an example using the dyplr package provided in the Tidyverse library.

# Resource ID found on CKAN site
data_id <- "c07482a5-c8e2-403c-9eaa-94153fc3659c"

# Add data to R environment
dataset_original <- dplyr::tbl(src = ckan$con, from = data_id) %>%
  as_tibble()      # Size of dataset with vary time to display

# Filter and select specific parameters from data using dplyr
tbl(src = ckan$con, from = data_id) %>%
  select(project_name, station_id) %>%    # Select specific columns
  filter(station_id =="GL_LWH_M") %>%     # Select specific station
  as_tibble()

Downloading a Dataset Directly to Hard drive¶

To download a dataset from CanWIN onto your hard drive, or disk in ckanr, you need the URL to the resource you would like to download. Ckanr provides you with a function that only requires the resource ID to generate this URL for you, which can then be accessed using a caller command. It is important to set your working directory to the location you would like to save your dataset in (generally coded at the beginning of the script). The example below utilizes our 2016 Lake Waterhen Ecotriplet dataset.

# Resource function to store ID information
res <- resource_show(id = "c07482a5-c8e2-403c-9eaa-94153fc3659c",
                     as = "table") #saving resourec information in variable "res"

# Fetching the first rows of data with column variable names for a preview using the resourse url
head(ckan_fetch(res$url)) #wait a few moments for a preview of the data to display in your console


#Setting the working directory in RStudio
wd = "D:/R/Ckan" #replace with harddrive pathway on your computer, forward slashes used to define pathways in R
setwd(wd) #setting the current working directory to the pathway specified in "wd"
getwd() prints the current working directory in your console


# Downloads file to your working directory
ckan_fetch(res$url, "disk", "file_name.csv") #download will be saved to your harddrive as file_name.csv