Run/Walk Data Pipeline (GitHub)
Batch pipeline: CSV → Parquet → DuckDB with SQL transformations and materialized tables. Orchestrated with Apache Airflow and exposed reporting via a Streamlit dashboard and Flask API for validation and reporting.
Python, Airflow, DuckDB, Streamlit, Flask