Design a scalable, fault‑tolerant notification system that supports email, SMS, and push notifications for a global user base. How would you architect it to handle millions of messages per day while ensuring eventual consistency and low latency?
onsite · 3-5 minutes
How to structure your answer
Context: Define scope and constraints. Identify: List key requirements (scalability, fault tolerance, eventual consistency, low latency). Recommend: Propose a microservices architecture with a message broker (Kafka/SQS), separate delivery services, and a retry/compensation layer. Clarify: Explain consistency model (eventual) and latency targets. List: Detail components – load balancer, API gateway, service registry, monitoring stack, and CDN for push. Evaluate: Discuss trade‑offs (CAP theorem, latency vs consistency). Summarize: Highlight how the design meets throughput, resilience, and observability.
Sample answer
To architect a notification system that scales to millions of messages daily, I would start with a microservices approach. A stateless API gateway fronts the system, routing requests to a notification orchestrator that publishes events to a Kafka cluster. Separate delivery services consume these events: EmailService, SmsService, and PushService, each with idempotent consumers and retry queues. For eventual consistency, each service writes to its own durable store and publishes a status event; a monitoring service aggregates these to provide a global delivery view. Load balancing across services and a CDN for push notifications ensure low latency. Observability is achieved with distributed tracing (Jaeger), metrics (Prometheus), and alerting (Grafana). This design satisfies scalability, fault tolerance, and low latency while adhering to the CAP theorem’s trade‑offs.
Key points to mention
- • Scalability via partitioned message broker
- • Fault tolerance with retry and idempotency
- • Eventual consistency and CAP trade‑offs
Common mistakes to avoid
- ✗ Ignoring idempotency in message consumers
- ✗ Overloading a single service with all notification types
- ✗ Neglecting monitoring and alerting