API to CSV Pipeline (Kafka/MinIO)

An end-to-end data pipeline using Apache Kafka for real-time streaming and MinIO for durable object storage.

API to CSV Pipeline (Kafka/MinIO)

This project tackles the challenge of reliable, high-frequency data ingestion from REST APIs. I designed a Kafka-based producer/consumer pipeline to decouple ingestion from storage, ensuring no events are dropped under load.

A Python producer streams data into Kafka topics while a consumer group handles transformation into partitioned CSV files. These are persisted in MinIO — an S3-compatible, self-hosted object store — ready for downstream workloads.

  • Decoupled producer/consumer architecture handles high-concurrency data streams without data loss.
  • Custom retry logic and rate-limiting for third-party API instability.
  • Partitioned MinIO storage optimises query performance for downstream consumers.
  • Back-pressure handling ensures pipeline resilience during traffic spikes.