What is Designing and Scaling Notifications about?

Learn how to design a notification service that scales to millions of users. Understand notification templates, asynchronous processing with message queues, bulk notification patterns, priority queues for avoiding starvation, and deduplication using Bloom filters.

How long does it take to read Designing and Scaling Notifications?

This article takes approximately 26 minutes to read.

What topics are covered in Designing and Scaling Notifications?

This article covers: System Design, Notifications, Message Queues, Bloom Filters, Redis.

Designing and Scaling Notifications

Every application needs to notify users. Order confirmations. Appointment reminders. Marketing campaigns. The mechanics seem simple: send a message to a user. But at scale, this becomes a genuinely interesting engineering problem.

What happens when you need to notify a million users about a flash sale? How do you ensure an appointment reminder isn't delayed behind 50 million marketing messages? How do you avoid sending the same notification twice when your iterator crashes and restarts?

This is where thoughtful system design separates mediocre notification systems from great ones.

TL;DR

Notification templates store reusable message formats with variables (like {{user.name}}) in a meta database
Control service handles template management and initial request processing, but doesn't send notifications directly
Asynchronous processing with message queues decouples request handling from notification emission
Workers are stateless and dumb: they receive fully-formed messages and just send them via provider SDKs (Twilio, Mailgun, OneSignal)
Bulk notifications use a separate iterator service that reads from the users database and enqueues individual messages
Priority queues (P1, P2, P3) prevent marketing campaigns from starving transactional notifications
Deduplication uses Bloom filters in Redis to prevent sending the same notification twice after iterator restarts
Horizontal scaling is achieved at every layer: SQS queues, stateless workers, sharded tracking database, user database replicas

The Problem Space

Design a notification service that sends notifications to users across multiple channels: email, SMS, Android push, Apple push. The system needs to be horizontally scalable and support high fan-out. You might need to notify a million users about a product launch, while simultaneously sending individual transaction confirmations.

Instead of jumping straight to "send millions of notifications," let's build incrementally. Start simple, understand the constraints, and evolve the architecture.

Notification Templates

Before sending any notification, you need to define what the notification says. But notifications aren't static. They contain personalized data: usernames, order numbers, discount percentages.

This is where notification templates come in. A template looks like:

CODE

Hello {{user.name}},

Your order from {{restaurant.name}} will arrive in {{eta}} minutes.
Get {{discount}}% off your next order!

The template is stored once. When sending, you inject the actual values. This separation of structure and content is fundamental.

Template Storage

Templates live in a notification meta database. This database is small. Even if you have ten thousand different notification types (which is a lot), each template might be 500 bytes. That's under 5 MB total. A single relational database handles this easily.

The notification control service provides APIs for internal teams to create and manage templates. A product manager defines a template, notes its ID, and later triggers notifications using that ID.

TYPESCRIPT

interface NotificationTemplate {
  id: string;
  channel: "email" | "sms" | "push_android" | "push_ios";
  subject?: string;
  body: string;
  variables: string[];
}

async function createTemplate(
  template: Omit<NotificationTemplate, "id">,
): Promise<NotificationTemplate> {
  const id = generateId();
  await db.template.create({
    data: { id, ...template },
  });
  return { id, ...template };
}

Notification Channels

Users can be notified through multiple channels:

Email: Providers like Mailgun, SES, SendGrid
SMS: Twilio, Message91
Android Push: Firebase Cloud Messaging, OneSignal
iOS Push: Apple Push Notification Service, OneSignal

Each channel has providers that expose APIs. You don't build email infrastructure from scratch. You integrate Mailgun's SDK, pass the email address and body, and they handle delivery.

The key insight: these API calls are expensive network operations. The servers making these calls need high network bandwidth. One machine cannot make millions of concurrent network calls. This shapes our architecture.

Day Zero Flow

Let's start with the simplest possible flow: one user, one notification.

A product manager wants to notify user U1 with notification N1. The flow:

PM calls the control service: "Send notification N1 to user U1 with these variables"
Control service fetches template N1 from meta database
Control service populates the template with user-specific values
Control service calls Twilio/Mailgun/OneSignal to send the notification
User receives the notification

TYPESCRIPT

async function sendNotification(request: {
  userId: string;
  templateId: string;
  variables: Record<string, string>;
  channel: string;
}): Promise<void> {
  const template = await db.template.findUnique({
    where: { id: request.templateId },
  });

  const body = populateTemplate(
    template.body,
    request.variables,
  );

  switch (request.channel) {
    case "email":
      await mailgun.send({ to: user.email, body });
      break;
    case "sms":
      await twilio.send({ to: user.phone, body });
      break;
  }
}

This works for one user. But what happens when you need to notify thousands?

The Bottleneck

Two problems emerge immediately:

Triggering one notification per user is painful. A PM isn't going to make 100,000 API calls manually
The control service becomes a bottleneck. Network calls to providers take time. Provider outages cause retries. The control service gets overwhelmed

The control service should control things, not do heavy lifting. Making synchronous calls to external providers is the wrong responsibility for this component.

Making It Asynchronous

The classic solution: introduce a message queue between the control service and the notification-sending logic.

When a notification request arrives, the control service:

Fetches the template from meta database
Populates it with user data
Creates a complete notification message
Pushes the message to a queue
Returns immediately

Workers consume messages from the queue and send actual notifications. The message contains everything a worker needs: user contact info, final notification body, channel. Workers are dumb. They don't need database connections. They just pick up a message and emit the notification.

TYPESCRIPT

interface NotificationMessage {
  userId: string;
  channel: "email" | "sms" | "push_android" | "push_ios";
  body: string;
  subject?: string;
  contactInfo: string;
}

async function handleNotificationRequest(
  request: NotificationRequest,
) {
  const template = await db.template.findUnique({
    where: { id: request.templateId },
  });

  const body = populateTemplate(
    template.body,
    request.variables,
  );
  const user = await db.user.findUnique({
    where: { id: request.userId },
  });

  const message: NotificationMessage = {
    userId: request.userId,
    channel: request.channel,
    body,
    contactInfo: getContactInfo(user, request.channel),
  };

  await sqs.sendMessage({
    QueueUrl: NOTIFICATION_QUEUE_URL,
    MessageBody: JSON.stringify(message),
  });
}

Workers are simple:

TYPESCRIPT

async function worker(): Promise<void> {
  while (true) {
    const response = await sqs.receiveMessage({
      QueueUrl: NOTIFICATION_QUEUE_URL,
      MaxNumberOfMessages: 10,
      WaitTimeSeconds: 20,
    });

    for (const msg of response.Messages ?? []) {
      const notification: NotificationMessage = JSON.parse(
        msg.Body ?? "{}",
      );

      await sendViaProvider(notification);

      await sqs.deleteMessage({
        QueueUrl: NOTIFICATION_QUEUE_URL,
        ReceiptHandle: msg.ReceiptHandle,
      });
    }
  }
}

async function sendViaProvider(
  notification: NotificationMessage,
) {
  switch (notification.channel) {
    case "email":
      await mailgun.send({
        to: notification.contactInfo,
        body: notification.body,
      });
      break;
    case "sms":
      await twilio.send({
        to: notification.contactInfo,
        body: notification.body,
      });
      break;
  }
}

Why This Architecture Wins

Retries are automatic. If Twilio is down, the worker fails to send. The message isn't deleted. After the visibility timeout, it reappears in the queue. Another worker picks it up and retries. The control service doesn't manage retries. The queue does.

The control service stays responsive. It accepts requests, creates messages, and returns. No waiting for provider responses. No getting hogged by retries.

Workers are stateless. Any worker can process any message. Scale horizontally by adding more workers.

Bulk Notifications

The architecture above works for moderate traffic where notifications are triggered one at a time. But consider the use case: "Notify everyone."

A PM submits a job: send this marketing notification to all users. If you have a million users, someone needs to iterate through the users table and create a million notification messages. Who does this?

Not the control service. If the control service iterates over a million users, it's blocked from accepting other requests. It becomes unavailable for transactional notifications that need immediate processing.

The solution: separate the iteration logic into its own service.

Iterator Architecture

Introduce a second queue for bulk notification requests. When a PM triggers a bulk notification:

Control service creates a bulk job message with the template ID and filter criteria
This message goes to the bulk queue, not the main notification queue
Iterator workers consume from the bulk queue
For each matching user, the iterator creates a notification message and enqueues it in the main notification queue
Regular notification workers process these messages and send notifications

TYPESCRIPT

interface BulkNotificationJob {
  templateId: string;
  filters: {
    city?: string;
    ageRange?: [number, number];
    lastLoginBefore?: Date;
    platform?: "android" | "ios";
  };
  channel: string;
  variables: Record<string, string>;
}

async function processBulkJob(job: BulkNotificationJob) {
  const template = await metaDb.template.findUnique({
    where: { id: job.templateId },
  });

  const users = await usersDb.user.findMany({
    where: buildWhereClause(job.filters),
  });

  for (const user of users) {
    const body = populateTemplate(template.body, {
      ...job.variables,
      "user.name": user.name,
    });

    const message: NotificationMessage = {
      userId: user.id,
      channel: job.channel,
      body,
      contactInfo: getContactInfo(user, job.channel),
    };

    await sqs.sendMessage({
      QueueUrl: NOTIFICATION_QUEUE_URL,
      MessageBody: JSON.stringify(message),
    });
  }
}

Users Database Isolation

The iterator reads from the users table intensively. To avoid affecting production traffic on your main users database, use a read replica. The iterator queries the replica, not the primary. Your actual user-facing operations remain unaffected.

The Starvation Problem

Here's a scenario that breaks our current design.

A PM launches a massive marketing campaign. A million notifications are enqueued. The workers start processing them. Meanwhile, a user completes a payment and needs a transaction confirmation. That confirmation message goes to the same queue, behind a million marketing messages.

When does the user receive their transaction confirmation? After all the marketing messages are processed. That could be hours.

This is starvation. Low-priority notifications are blocking high-priority ones.

Priority Queues

The solution: multiple queues with different priorities.

Priority	Use Case	Examples
P1 (High)	Transactional	Payment confirmations, OTPs, appointment reminders
P2 (Medium)	Default	Order updates, account notifications
P3 (Low)	Marketing	Campaigns, promotions, newsletters

Each priority level has its own SQS queue and its own set of workers.

TYPESCRIPT

type Priority = "P1" | "P2" | "P3";

const QUEUE_URLS: Record<Priority, string> = {
  P1: process.env.SQS_P1_URL!,
  P2: process.env.SQS_P2_URL!,
  P3: process.env.SQS_P3_URL!,
};

async function enqueueNotification(
  message: NotificationMessage,
  priority: Priority,
) {
  await sqs.sendMessage({
    QueueUrl: QUEUE_URLS[priority],
    MessageBody: JSON.stringify(message),
  });
}

When triggering a notification, you specify the priority:

TYPESCRIPT

await controlService.sendNotification({
  templateId: "payment_confirmation",
  userId: "user_123",
  variables: { amount: "$50.00" },
  channel: "email",
  priority: "P1",
});

P1 workers are never blocked by P3 messages. Your payment confirmations go out immediately, even during a massive marketing campaign.

You can also tune worker counts per priority. Maybe you run more P3 workers during off-peak hours when marketing campaigns typically execute, and scale them down during peak transaction times.

The Deduplication Problem

Another failure mode. The iterator is processing a million users. It's enqueued 500,000 messages. Then it crashes.

When the iterator restarts, it starts from the beginning. It iterates through all users again. Those 500,000 users who already received the notification? They're about to receive it again.

Duplicate marketing notifications are a bad user experience. Users complain. They unsubscribe. They mark you as spam.

Tracking Sent Notifications

We need to track which users have already received a notification from a specific campaign. Before enqueuing a message, check if it was already sent.

A naive approach: store (user_id, campaign_id) pairs in a database.

TYPESCRIPT

async function shouldSendNotification(
  userId: string,
  campaignId: string,
): Promise<boolean> {
  const existing = await redis.get(
    `sent:${campaignId}:${userId}`,
  );
  return !existing;
}

async function markAsSent(
  userId: string,
  campaignId: string,
): Promise<void> {
  await redis.set(
    `sent:${campaignId}:${userId}`,
    "1",
    "EX",
    86400 * 7,
  );
}

Let's compute the storage. User ID: 4 bytes. Campaign ID: 4 bytes. Total: 8 bytes per entry.

With 100 million users and 5 concurrent marketing campaigns: 100M × 5 × 8 bytes = 4 GB.

That's manageable, but we can do better.

Bloom Filters for Deduplication

For marketing notifications, we don't need 100% accuracy. If we occasionally skip sending to a user who hasn't received the notification, it's acceptable. Marketing has some tolerance for imprecision.

Bloom filters are perfect here. They're probabilistic data structures that tell you with 100% certainty when something doesn't exist, but can have false positives when saying something exists.

When the Bloom filter says "no": The user definitely hasn't received this notification. Send it.

When the Bloom filter says "yes": The user might have received this notification. Skip it to be safe.

False positives mean some users don't receive the marketing notification. That's acceptable. False negatives (sending duplicates) don't happen with Bloom filters.

Redis supports Bloom filters natively:

TYPESCRIPT

async function shouldSendNotification(
  userId: string,
  campaignId: string,
): Promise<boolean> {
  const exists = await redis.call(
    "BF.EXISTS",
    `campaign:${campaignId}`,
    userId,
  );
  return exists === 0;
}

async function markAsSent(
  userId: string,
  campaignId: string,
): Promise<void> {
  await redis.call(
    "BF.ADD",
    `campaign:${campaignId}`,
    userId,
  );
}

The storage savings are significant. A Bloom filter for 100 million users with 1% false positive rate requires about 114 MB. Compare that to 4 GB for explicit storage. That's a 35x reduction.

Where to Deduplicate

You could deduplicate at the worker level: worker receives message, checks Bloom filter, skips if already sent. But this means you've already enqueued the message, transmitted it over the network, and a worker has consumed it.

Better: deduplicate at the iterator level. Before enqueuing a message, the iterator checks the Bloom filter. If the notification was already sent, it doesn't enqueue. This saves queue capacity, worker time, and network bandwidth.

TYPESCRIPT

async function processBulkJob(job: BulkNotificationJob) {
  const template = await metaDb.template.findUnique({
    where: { id: job.templateId },
  });

  const users = await usersDb.user.findMany({
    where: buildWhereClause(job.filters),
  });

  for (const user of users) {
    const shouldSend = await shouldSendNotification(
      user.id,
      job.campaignId,
    );
    if (!shouldSend) {
      continue;
    }

    const body = populateTemplate(template.body, {
      ...job.variables,
      "user.name": user.name,
    });

    const message: NotificationMessage = {
      userId: user.id,
      channel: job.channel,
      body,
      contactInfo: getContactInfo(user, job.channel),
      campaignId: job.campaignId,
    };

    await sqs.sendMessage({
      QueueUrl: QUEUE_URLS[job.priority],
      MessageBody: JSON.stringify(message),
    });

    await markAsSent(user.id, job.campaignId);
  }
}

When the worker successfully sends the notification, it updates the Bloom filter. If the iterator restarts, it skips users already in the Bloom filter.

Note: for transactional notifications (P1), duplicates are generally acceptable. Getting two "payment successful" notifications is fine. The deduplication logic is primarily for marketing campaigns.

Final Architecture

Putting it all together:

Control Service: Accepts notification requests from internal services and PMs. For single-user notifications, fetches the template, populates it, and enqueues directly. For bulk notifications, creates a bulk job and enqueues to the bulk queue.

Meta Database: Stores notification templates. Small, single relational database. Read-heavy, rarely written.

Bulk Queue (SQS): Holds bulk notification jobs with filter criteria.

Iterator Workers: Consume bulk jobs. Iterate over users database (using a replica). For each matching user, check Bloom filter, create notification message, enqueue to the appropriate priority queue, update Bloom filter.

Priority Queues (SQS P1, P2, P3): Three queues for different priorities. P1 for transactional, P2 for default, P3 for marketing.

Notification Workers: Consume from priority queues. Send notifications via provider SDKs (Twilio, Mailgun, OneSignal). Stateless. Scale horizontally.

Notification Tracker (Redis): Bloom filters keyed by campaign ID. Used for deduplication of marketing notifications.

Users Database Replica: Read replica of main users database. Iterator reads from this to avoid affecting production traffic.

Scaling Properties

Every component scales horizontally:

Component	Scaling Approach
Control Service	Stateless, add more instances behind load balancer
Meta Database	Read replicas (write traffic is minimal)
Iterator Workers	Add more workers to process bulk jobs faster
Priority Queues	SQS is managed, scales automatically
Notification Workers	Add workers per priority queue as needed
Notification Tracker	Shard Redis by campaign ID
Users Replica	Add replicas for read throughput

Design Principles Applied

Start simple, evolve incrementally. We started with synchronous single-user notifications, identified bottlenecks, and added complexity only where needed.

Separate concerns. The control service manages templates and requests. Iterators handle bulk expansion. Workers handle emission. Each component has one job.

Make workers dumb. Workers receive complete messages. No database calls, no business logic. They just send. This makes them stateless and trivially scalable.

Use queues for decoupling. Queues absorb load spikes, enable retries, and let producers and consumers operate independently.

Trade accuracy for efficiency where acceptable. Bloom filters trade some marketing accuracy for massive storage savings. For marketing notifications, this trade-off makes sense.

Protect critical paths from bulk operations. Priority queues ensure transactional notifications aren't blocked by marketing campaigns.

Conclusion

Building a notification service that scales requires thinking beyond "send message to user." The architecture needs to handle:

Template management for reusable, personalized messages
Asynchronous processing to avoid blocking on slow provider calls
Bulk iteration without overwhelming the control service
Priority separation to prevent starvation of critical notifications
Deduplication to avoid annoying users with duplicate marketing messages

The final system is horizontally scalable at every layer. Add more workers to increase throughput. Add more Redis shards to handle more campaigns. Use database replicas to protect production traffic.

Most importantly, the architecture reveals itself incrementally. You don't design all of this upfront. You start simple, identify the bottleneck, solve it, and repeat. Each constraint you hit teaches you something about the system you're building.

Designing and Scaling Notifications

TL;DR

The Problem Space

Notification Templates

Template Storage

Notification Channels

Day Zero Flow

The Bottleneck

Making It Asynchronous

Why This Architecture Wins

Bulk Notifications

Iterator Architecture

Users Database Isolation

The Starvation Problem

Priority Queues

The Deduplication Problem

Tracking Sent Notifications

Bloom Filters for Deduplication

Where to Deduplicate

Final Architecture

Scaling Properties

Design Principles Applied

Conclusion

Related Posts

Async Processing with Message Queues, Streams, and Pub/Sub

Bloom Filters

Introduction to Big Data Tools