Skip to content

Using Python to Access the CKAN DataStore API

Python is a powerful way to interact with CKAN’s DataStore API.
Instead of downloading entire files, you can query, filter, and retrieve only the data you need.


Basic Approaches

Two common libraries for making HTTP requests in Python are:

  • requests → simple, widely used, integrates well with pandas
  • urllib.request → part of Python’s standard library, no extra installation needed
Examples
import pandas as pd
import requests

# API endpoint
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2"

# Make a GET request
response = requests.get(url)

# Convert response to JSON dictionary
response_data = response.json()

# Extract records
data = response_data["result"]["records"]

# Inspect column headers
col_headers = data[0].keys()

# Filter rows where Cast == "3"
filtered_data = [row for row in data if row["Cast"] == "3"]

# Convert to DataFrames
df_all = pd.DataFrame(data)
df_filtered = pd.DataFrame(filtered_data)

# Save to CSV
df_all.to_csv("output.csv", index=False)
import urllib.request
import json
import pandas as pd

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=ea474f80-dcbe-4647-a28d-7fdce1293e09"

# Make request
http_response = urllib.request.urlopen(url)

# Read raw bytes
raw_data = http_response.read()

# Decode using response encoding
encoding = http_response.info().get_content_charset()
data_dict = json.loads(raw_data.decode(encoding))

# Extract records
data = data_dict["result"]["records"]

# Convert to DataFrame
df = pd.DataFrame(data)

# Save to CSV
df.to_csv("output.csv", index=False)

You can refine queries by appending parameters to the URL after the resource_id.
Use & between each key=value pair.

  • Limit results&limit=2
  • Filter records&filters={"key":"value"}

See full parameter list in the DataStore API reference.

Examples

Format: &filters={"key":"value"}

import pandas as pd
import requests

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2&filters={\"Cast\":\"3\",\"sample_date\":\"2016-06-09T00:00:00\"}"

response = requests.get(url) data = response.json()["result"]["records"]

df = pd.DataFrame(data) df ```

Format: &limit=2

import pandas as pd
import requests

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2&limit=2"

response = requests.get(url)
data = response.json()["result"]["records"]

df = pd.DataFrame(data)
df


Authentication

Some endpoints require an API key.
Include it in your request headers:

import requests

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/package_list"
headers = {"Authorization": "YOUR-API-KEY"}

response = requests.get(url, headers=headers)
print(response.json())