Skip to content

Add Your Own Data Source

The current sources are starter templates. To add another official/open source, copy the closest block in configs/sources.yml, change the ID and access details, then run the ingest script.

The Basic Workflow

  1. Find an official/public source URL, API endpoint, STAC collection, SPARQL endpoint, or BFS PX-Web table.
  2. Choose the closest connector type from the table below.
  3. Add a new block under sources: in configs/sources.yml.
  4. Run python scripts/ingest.py --list to check that the source is registered.
  5. Run python scripts/ingest.py --source your_source_id.
  6. Inspect data/raw/your_source_id/<timestamp>/payload.* and metadata.json.

Connector Types

Use case Connector type Copy this starter
Direct CSV, HTML, PDF, or other file URL http_file sfoe_energy_balance_csv or armasuisse_st_publications
Simple JSON API with query parameters http_json parliament_affairs
JSON API that needs a POST body http_post_json aramis_armasuisse_research_projects
opendata.swiss CKAN package metadata ckan_package_show opendata_swiss
opendata.swiss CKAN search query ckan_package_search Use the same base_url, add params
geo.admin.ch or MeteoSwiss STAC collection stac_collection meteo_swiss_smn or any geoadmin_* source
SPARQL endpoint returning JSON sparql fedlex
SPARQL endpoint returning CSV sparql_csv lindas
BFS / STAT-TAB PX-Web table pxweb_query bfs_pxweb

Minimal Examples

Direct CSV or File

my_csv_source:
  source_name: My official CSV source
  enabled: true
  type: http_file
  format: CSV
  url: https://example.admin.ch/data.csv
  suffix: csv
  documentation:
    url: https://example.admin.ch
    access_path: Direct CSV download
    license_or_terms: Check official terms
    geographic_reference: Switzerland / dataset dependent
    update_logic: Dataset dependent

JSON API

my_json_api:
  source_name: My JSON API source
  enabled: true
  type: http_json
  format: JSON
  url: https://example.admin.ch/api/items
  params:
    limit: 100
    language: en
  documentation:
    url: https://example.admin.ch/api
    access_path: /api/items?limit=100&language=en
    license_or_terms: Check official terms
    geographic_reference: Switzerland / dataset dependent
    update_logic: API dependent

STAC Collection

my_geodata_layer:
  source_name: My geo.admin.ch layer
  enabled: true
  type: stac_collection
  format: STAC / JSON metadata / geodata assets
  collection_url: https://data.geo.admin.ch/api/stac/v1/collections/ch.example.layer
  documentation:
    url: https://data.geo.admin.ch
    access_path: /api/stac/v1/collections/ch.example.layer
    license_or_terms: Per geo.admin.ch / dataset metadata
    geographic_reference: Switzerland / geospatial layer
    update_logic: Dataset dependent

SPARQL Query

my_sparql_source:
  source_name: My SPARQL source
  enabled: true
  type: sparql_csv
  format: SPARQL CSV results
  endpoint: https://ld.admin.ch/query
  query: |
    SELECT ?s ?p ?o
    WHERE {
      ?s ?p ?o .
    }
    LIMIT 100
  documentation:
    url: https://ld.admin.ch
    access_path: /query?format=csv&query=<your query>
    license_or_terms: Dataset dependent
    geographic_reference: Switzerland / dataset dependent
    update_logic: Dataset dependent

BFS PX-Web

For BFS, first open the exact STAT-TAB table and use the table-specific API address. Do not guess the endpoint.

my_bfs_table:
  source_name: My BFS table
  enabled: true
  type: pxweb_query
  format: PX-Web API / CSV
  endpoint: https://www.pxweb.bfs.admin.ch/api/v1/en/path/to/table.px
  query:
    - code: Jahr
      selection:
        filter: item
        values: ["2024"]
  response:
    format: CSV
  documentation:
    url: https://www.pxweb.bfs.admin.ch
    access_path: Table-specific PX-Web endpoint
    license_or_terms: BFS terms
    geographic_reference: Switzerland / dataset dependent
    update_logic: Dataset dependent

Run It

python scripts/ingest.py --list
python scripts/ingest.py --source my_csv_source

In the local virtual environment on Windows, use:

.\.venv\Scripts\python.exe scripts\ingest.py --source my_csv_source

What To Document

For every new source, fill out the documentation block. The most important fields are:

Field Why it matters
url Official documentation or landing page.
access_path Exact API path, file URL, query, or table route.
license_or_terms Reuse conditions for the hackathon result.
geographic_reference Switzerland, canton, municipality, station, polygon, etc.
update_logic How often or under what process the source changes.

This is the difference between "we downloaded a file" and "we can reproduce and defend the data lineage."