Skip to content
Back to all projects
completed

Distributed Twitter Analytics Platform

Real-time distributed analytics engine for streaming social data with scalable processing and query support.

Go
gRPC
Redis
Apache Kafka
ClickHouse
Docker
View source

System Evidence

  • high throughput workload planning
  • Go, gRPC, Redis implementation surface
  • 85% latency-focused optimization target
Traffic Target
10,000 req/sec
Latency Gain
85% lower
Reliability
99.9% uptime
Daily Data
2.5TB

Overview

A distributed analytics platform designed to ingest, process, and query Twitter streaming data in real time. The system handles the full data lifecycle — from ingestion through Kafka consumers, real-time processing via Go workers, to analytical queries through ClickHouse.

Problem

Social stream analytics needs high write throughput, low-latency query paths, and recovery behavior that keeps duplicated events from corrupting downstream analysis.

Architecture

The platform follows an event-driven microservices architecture:

  • Ingestion Layer: Kafka consumers processing Twitter firehose data with at-least-once delivery guarantees
  • Processing Layer: Go-based worker pool with gRPC inter-service communication
  • Storage Layer: ClickHouse for analytical queries, Redis for hot data caching
  • Query Layer: Low-latency query API with automatic query optimization

Engineering Tradeoffs

  • Used Kafka as the durability boundary between ingestion and processing
  • Kept hot aggregations in Redis while ClickHouse handled analytical scans
  • Designed workers around idempotent processing so retries would not inflate metrics

Key Achievements

  • Designed for sustained real-time query traffic
  • Reduced end-to-end latency compared to batch processing
  • Supports large-volume daily data processing
  • Uses idempotent processing to reduce data loss risk

Validation Focus

  • Measure consumer lag and worker throughput during stream bursts
  • Compare cached query paths against ClickHouse analytical queries
  • Test replay scenarios to confirm duplicate handling remains stable