Architecture¶
Target picture¶
The challenge lends itself to a modular layered architecture. It separates source connection, processing, storage, analytics, delivery, and documentation. That improves traceability, allows components to be exchanged more easily, and supports further development after the hackathon.
Layered architecture at a glance¶
| Layer | Purpose | Typical outputs |
|---|---|---|
| Source Layer | connect public data sources, documents, and services | raw data, metadata, source references |
| Ingestion Layer | retrieval, download logic, and technical capture | API responses, files, snapshots |
| Raw Data Layer | store source artifacts unchanged | versioned originals and retrieval logs |
| Processing / ETL / ELT Layer | parsing, cleaning, normalization, and mapping | harmonized datasets and derived features |
| Storage Layer | structured storage for analysis and reuse | relational tables, geodata, optional document or vector stores |
| Analytics & Intelligence Layer | indicators, trend cards, weak signals, and evidence logic | calculated metrics, signals, assessments |
| API Layer | standardized delivery for frontend or other consumers | REST endpoints, OpenAPI, Swagger UI |
| Presentation Layer | dashboard, web app, and visualization | maps, timelines, indicators, filters |
| Documentation Layer | technical and analytical traceability | source inventory, architecture, reproducibility |
Architecture principles¶
- Modularity: every layer should remain replaceable and testable on its own
- Source transparency: every result should remain tied back to concrete sources
- Versioning: raw data, transformation logic, and indicators should be reproducibly versioned
- Reproducibility: documentation, schema, and access logic should remain reusable after the hackathon
- Small core: prefer a few reliable components over a broad but fragile stack
Technology guidance¶
| Area | Future hosted/server option | File-based MVP option |
|---|---|---|
| SQL database | PostgreSQL | DuckDB or SQLite |
| Geospatial database | PostGIS | GeoParquet + DuckDB, or later PostGIS |
| Document or NoSQL storage | MongoDB | JSONL/Parquet + DuckDB, SQLite JSON, or TinyDB |
| Vector store | ChromaDB server/cloud | ChromaDB persistent folder |
| Search / full text | OpenSearch or Elasticsearch | SQLite FTS5, DuckDB FTS, Whoosh, or Tantivy |
| Backend / API | FastAPI | FastAPI |
| Frontend | React or Streamlit | React or Streamlit |
| Documentation | MkDocs | MkDocs |
| API documentation | Swagger UI / OpenAPI | Swagger UI / OpenAPI |
Minimal technical cut for an MVP¶
A pragmatic MVP can already work with the following building blocks:
- connectors for four public sources
- raw data storage plus simple versioning
- DuckDB, SQLite, Parquet, and/or ChromaDB as no-budget file-based stores in the curated data folder
- PostgreSQL/PostGIS, MongoDB, or OpenSearch only if the team later chooses future hosted/server infrastructure
- FastAPI for data and indicator delivery
- React or Streamlit frontend with a map and a timeline
- MkDocs for technical and analytical documentation
Pragmatic focus
Not every optional component needs to be implemented during the hackathon. What matters is that the end-to-end flow from source to visualization works in a traceable way.