API to CSV Pipeline (Kafka/MinIO)
An end-to-end data pipeline using Apache Kafka for real-time streaming and MinIO for durable object storage.
Technical Overview
This project tackles the challenge of reliable, high-frequency data ingestion from REST APIs. I designed a Kafka-based producer/consumer pipeline to decouple ingestion from storage, ensuring no events are dropped under load.
A Python producer streams data into Kafka topics while a consumer group handles transformation into partitioned CSV files. These are persisted in MinIO — an S3-compatible, self-hosted object store — ready for downstream workloads.
Project Outcomes
- Decoupled producer/consumer architecture handles high-concurrency data streams without data loss.
- Custom retry logic and rate-limiting for third-party API instability.
- Partitioned MinIO storage optimises query performance for downstream consumers.
- Back-pressure handling ensures pipeline resilience during traffic spikes.