Data Ingestion With RenkuLab¶
This project uses RenkuLab for reproducible sessions and persistent data storage. Public APIs are accessed by code in this repository; RenkuLab Data Connectors provide the mounted folders for raw and curated data.
Repository Structure¶
configs/sources.yml # source registry
scripts/ingest.py # first ingestion runner
data/raw/ # local raw snapshots, ignored by Git
data/curated/ # local cleaned MVP outputs, ignored by Git
environment.yml # Python/Jupyter environment for RenkuLab
data/raw/ and data/curated/ are not committed to Git. In local development,
the ingestion script writes there directly. In RenkuLab, data connectors are
mounted as sibling folders next to the repository, for example ../govtech-raw
and ../govtech-curated. The script and notebook automatically use those
Renku mounts when they exist.
Local Test¶
python scripts/ingest.py --list
python scripts/ingest.py --source opendata_swiss
Each successful run writes two files:
data/raw/<source_id>/<timestamp>/payload.json
data/raw/<source_id>/<timestamp>/metadata.json
metadata.json contains the source, access path, format, and concrete retrieval
timestamp.
RenkuLab Steps¶
- Create or open the project in RenkuLab.
- Connect the GitHub repository
https://github.com/allsparkswiss-hub/GovTECH. - Select the branch
renku-data-connectors. - Create a Data Connector
govtech-rawfrom the Google Drive folderGovTECH-Renku/raw. - Create a Data Connector
govtech-curatedfrom the Google Drive folderGovTECH-Renku/curated. - Create a session launcher from the repository and
environment.yml. - Start a session and test ingestion:
python scripts/ingest.py --list
python scripts/ingest.py --all
find ../govtech-raw -maxdepth 3 -type f | head
The same python scripts/ingest.py --all command works locally and in RenkuLab.
The default local target is data/raw; the Renku fallback target is
../govtech-raw.
MVP Sources¶
The first MVP run uses thirteen enabled sources:
opendata_swisssfoe_energy_balance_csvmeteo_swiss_smngeoadmin_army_nature_landscapegeoadmin_civil_protection_meeting_pointsgeoadmin_surface_runoff_hazardgeoadmin_nuclear_emergency_zonesarmasuisse_st_publicationsaramis_armasuisse_research_projectsparliament_affairslindasfedlexbfs_pxweb
The sources cover energy, climate/weather, population exposure, public geodata,
armasuisse technology publications, federal research projects, parliamentary
signals, and regulatory signals. The data first remains as raw snapshots under
data/raw/; curated tables and indicators can later be written to
data/curated/.