Add Your Own Data Source¶
The current sources are starter templates. To add another official/open source, copy the closest block in configs/sources.yml, change the ID and access details, then run the ingest script.
The Basic Workflow¶
- Find an official/public source URL, API endpoint, STAC collection, SPARQL endpoint, or BFS PX-Web table.
- Choose the closest connector type from the table below.
- Add a new block under
sources:inconfigs/sources.yml. - Run
python scripts/ingest.py --listto check that the source is registered. - Run
python scripts/ingest.py --source your_source_id. - Inspect
data/raw/your_source_id/<timestamp>/payload.*andmetadata.json.
Connector Types¶
| Use case | Connector type | Copy this starter |
|---|---|---|
| Direct CSV, HTML, PDF, or other file URL | http_file |
sfoe_energy_balance_csv or armasuisse_st_publications |
| Simple JSON API with query parameters | http_json |
parliament_affairs |
| JSON API that needs a POST body | http_post_json |
aramis_armasuisse_research_projects |
| opendata.swiss CKAN package metadata | ckan_package_show |
opendata_swiss |
| opendata.swiss CKAN search query | ckan_package_search |
Use the same base_url, add params |
| geo.admin.ch or MeteoSwiss STAC collection | stac_collection |
meteo_swiss_smn or any geoadmin_* source |
| SPARQL endpoint returning JSON | sparql |
fedlex |
| SPARQL endpoint returning CSV | sparql_csv |
lindas |
| BFS / STAT-TAB PX-Web table | pxweb_query |
bfs_pxweb |
Minimal Examples¶
Direct CSV or File¶
my_csv_source:
source_name: My official CSV source
enabled: true
type: http_file
format: CSV
url: https://example.admin.ch/data.csv
suffix: csv
documentation:
url: https://example.admin.ch
access_path: Direct CSV download
license_or_terms: Check official terms
geographic_reference: Switzerland / dataset dependent
update_logic: Dataset dependent
JSON API¶
my_json_api:
source_name: My JSON API source
enabled: true
type: http_json
format: JSON
url: https://example.admin.ch/api/items
params:
limit: 100
language: en
documentation:
url: https://example.admin.ch/api
access_path: /api/items?limit=100&language=en
license_or_terms: Check official terms
geographic_reference: Switzerland / dataset dependent
update_logic: API dependent
STAC Collection¶
my_geodata_layer:
source_name: My geo.admin.ch layer
enabled: true
type: stac_collection
format: STAC / JSON metadata / geodata assets
collection_url: https://data.geo.admin.ch/api/stac/v1/collections/ch.example.layer
documentation:
url: https://data.geo.admin.ch
access_path: /api/stac/v1/collections/ch.example.layer
license_or_terms: Per geo.admin.ch / dataset metadata
geographic_reference: Switzerland / geospatial layer
update_logic: Dataset dependent
SPARQL Query¶
my_sparql_source:
source_name: My SPARQL source
enabled: true
type: sparql_csv
format: SPARQL CSV results
endpoint: https://ld.admin.ch/query
query: |
SELECT ?s ?p ?o
WHERE {
?s ?p ?o .
}
LIMIT 100
documentation:
url: https://ld.admin.ch
access_path: /query?format=csv&query=<your query>
license_or_terms: Dataset dependent
geographic_reference: Switzerland / dataset dependent
update_logic: Dataset dependent
BFS PX-Web¶
For BFS, first open the exact STAT-TAB table and use the table-specific API address. Do not guess the endpoint.
my_bfs_table:
source_name: My BFS table
enabled: true
type: pxweb_query
format: PX-Web API / CSV
endpoint: https://www.pxweb.bfs.admin.ch/api/v1/en/path/to/table.px
query:
- code: Jahr
selection:
filter: item
values: ["2024"]
response:
format: CSV
documentation:
url: https://www.pxweb.bfs.admin.ch
access_path: Table-specific PX-Web endpoint
license_or_terms: BFS terms
geographic_reference: Switzerland / dataset dependent
update_logic: Dataset dependent
Run It¶
python scripts/ingest.py --list
python scripts/ingest.py --source my_csv_source
In the local virtual environment on Windows, use:
.\.venv\Scripts\python.exe scripts\ingest.py --source my_csv_source
What To Document¶
For every new source, fill out the documentation block. The most important fields are:
| Field | Why it matters |
|---|---|
url |
Official documentation or landing page. |
access_path |
Exact API path, file URL, query, or table route. |
license_or_terms |
Reuse conditions for the hackathon result. |
geographic_reference |
Switzerland, canton, municipality, station, polygon, etc. |
update_logic |
How often or under what process the source changes. |
This is the difference between "we downloaded a file" and "we can reproduce and defend the data lineage."