Skip to content
Back to all projects
production
high
Uptime 99.9%
10,000 req/s

Distributed Twitter Analytics Platform

Real-time distributed analytics engine processing Twitter firehose data. Built for horizontal scalability with sub-millisecond query latency across billions of data points.

Go
gRPC
Redis
Apache Kafka
ClickHouse
Docker
View source

Overview

A high-performance distributed analytics platform designed to ingest, process, and query Twitter streaming data in real-time. The system handles the full data lifecycle — from ingestion through Kafka consumers, real-time processing via Go workers, to low-latency analytical queries through ClickHouse.

Architecture

The platform follows an event-driven microservices architecture:

  • Ingestion Layer: Kafka consumers processing Twitter firehose data with at-least-once delivery guarantees
  • Processing Layer: Go-based worker pool with gRPC inter-service communication
  • Storage Layer: ClickHouse for analytical queries, Redis for hot data caching
  • Query Layer: Sub-millisecond query API with automatic query optimization

Key Achievements

  • Achieved 10,000+ requests/sec sustained throughput
  • Reduced end-to-end latency by 85% compared to batch processing
  • Processing 2.5TB of data daily with 99.9% uptime
  • Zero data loss guarantee through idempotent processing