Using Python to Access the CKAN DataStore API
Python is a powerful way to interact with CKAN’s DataStore API.
Instead of downloading entire files, you can query, filter, and retrieve only the data you need.
Basic Approaches
Two common libraries for making HTTP requests in Python are:
requests→ simple, widely used, integrates well withpandasurllib.request→ part of Python’s standard library, no extra installation needed
Examples
import pandas as pd
import requests
# API endpoint
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2"
# Make a GET request
response = requests.get(url)
# Convert response to JSON dictionary
response_data = response.json()
# Extract records
data = response_data["result"]["records"]
# Inspect column headers
col_headers = data[0].keys()
# Filter rows where Cast == "3"
filtered_data = [row for row in data if row["Cast"] == "3"]
# Convert to DataFrames
df_all = pd.DataFrame(data)
df_filtered = pd.DataFrame(filtered_data)
# Save to CSV
df_all.to_csv("output.csv", index=False)
import urllib.request
import json
import pandas as pd
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=ea474f80-dcbe-4647-a28d-7fdce1293e09"
# Make request
http_response = urllib.request.urlopen(url)
# Read raw bytes
raw_data = http_response.read()
# Decode using response encoding
encoding = http_response.info().get_content_charset()
data_dict = json.loads(raw_data.decode(encoding))
# Extract records
data = data_dict["result"]["records"]
# Convert to DataFrame
df = pd.DataFrame(data)
# Save to CSV
df.to_csv("output.csv", index=False)
Adding Parameters with datastore_search
You can refine queries by appending parameters to the URL after the resource_id.
Use & between each key=value pair.
- Limit results →
&limit=2 - Filter records →
&filters={"key":"value"}
See full parameter list in the DataStore API reference.
Examples
Format: &filters={"key":"value"}
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2&filters={\"Cast\":\"3\",\"sample_date\":\"2016-06-09T00:00:00\"}"
response = requests.get(url) data = response.json()["result"]["records"]
df = pd.DataFrame(data) df ```
Authentication
Some endpoints require an API key.
Include it in your request headers: