William — Data Engineer

Technical Overview

This project explores the challenges of real-time data ingestion from high-frequency REST APIs. I designed a Kafka-based producer/consumer pipeline to decouple data ingestion from storage processing, ensuring no events are lost under significant load.

The architecture utilizes a custom Python producer to stream data into Kafka topics, while a consumer group handles the transformation into partitioned CSV files. These are then persisted in MinIO, providing an S3-compatible, fault-tolerant sink ready for downstream analytical workloads.

Project Outcomes

Architected a decoupled producer/consumer system to handle high-concurrency data streams.

Developed custom retry logic and rate-limiting handling for third-party API stability.

Implemented partitioned storage in MinIO to optimize query performance for downstream analytics.

Configured the pipeline to handle back-pressure gracefully, ensuring system resilience during traffic spikes.