0015. Notification Service Fire-and-Forget / Asynchronous Messaging Pattern
Status: Accepted
Date: 2026-01-23
Context: Centralized Notification Service design: how requests are acknowledged vs delivered asynchronously at scale.
Context
The Notification Service is being designed as a centralized, multi-channel notification delivery service
(email, SMS, push notifications).
A key architectural decision is how the service handles notification requests and delivery.
Options Considered:
- Synchronous Delivery - Client waits for notification to be delivered before receiving response
- Fire-and-Forget / Asynchronous Messaging - Client receives immediate acknowledgment, delivery happens asynchronously
- Hybrid - Synchronous for critical, asynchronous for bulk
Requirements:
- High throughput (thousands of notifications per minute)
- Priority-based processing (critical notifications must not be blocked)
- Graceful degradation under load
- Stateless service design (following ADR-0011)
- Eventually-consistent delivery guarantees
Decision
The Notification Service will follow the Fire-and-Forget (also known as Asynchronous Messaging) pattern with eventual consistency.
Core Principles:
- Immediate Acknowledgment
- Client sends notification request
- Service immediately returns
201 Created with notification ID
- Client does not wait for delivery confirmation
- Client only receives confirmation that the notification was accepted by the system
- Asynchronous Processing
- Notifications are queued and processed asynchronously
- Processing happens at a later time (decoupled from client request)
- Delivery time depends on system load, priority, and provider availability
- Eventually-Consistent
- Notifications are guaranteed to be delivered eventually, but not immediately
- No immediate delivery guarantee
- Delivery status must be checked via status endpoint
- No synchronous waiting for delivery confirmation
- Priority-Based Processing
- High-priority notifications (e.g., password resets) processed before low-priority (e.g., marketing)
- When system approaches capacity, low-priority notifications may be delayed
- Priority ensures critical notifications are not blocked by bulk operations
Priority Levels:
- CRITICAL - Security-critical (password resets, account lockouts) - processed immediately
- HIGH - Transactional (order confirmations, activation emails) - processed within seconds
- NORMAL - Standard notifications (welcome emails, updates) - processed within minutes
- LOW - Marketing, newsletters - processed when capacity available
Rationale
Why Fire-and-Forget:
- Scalability - Decouples client from delivery, enabling high throughput
- Fault Tolerance - Client doesn’t block on provider failures
- Resource Efficiency - Better utilization of system resources
- Priority Support - Enables priority-based processing without blocking clients
Why Eventually-Consistent:
- High Load Handling - System can handle spikes without blocking
- Graceful Degradation - Low-priority notifications can be delayed during high load
- Provider Constraints - External providers (SES, Twilio) have rate limits that require queuing
- Retry Logic - Failed notifications can be retried without client involvement
Why Not Synchronous:
- Blocking - Clients would wait for provider responses (SES, Twilio), increasing latency
- Coupling - Client availability depends on provider availability
- No Priority - Cannot prioritize critical notifications over bulk operations
- Poor Scalability - Limited by slowest provider response time
Consequences
Positive:
- High Throughput - System can handle thousands of notifications per minute
- Better Scalability - Horizontal scaling without client coordination
- Fault Tolerance - Provider failures don’t block clients
- Priority Support - Critical notifications always processed first
- Graceful Degradation - System degrades gracefully under load
- Stateless Design - Aligns with ADR-0011 (stateless services)
Negative / Tradeoffs:
- No Immediate Delivery Guarantee - Clients cannot assume immediate delivery
- Status Checking Required - Clients must check delivery status if confirmation needed
- Potential Delays - Low-priority notifications may experience delays during high load
- Complexity - Requires delivery tracking, retry logic, and status endpoints
Mitigations:
- Status Endpoint -
GET /notifications/{notificationId} for delivery status
- Priority System - Ensures critical notifications are not delayed
- Retry Logic - Failed notifications are automatically retried
- Monitoring - Metrics and dashboards for delivery tracking
Implementation
API Design:
Request Flow:
- Client sends
POST /notifications with notification details
- Service validates request, creates notification record (status: QUEUED)
- Service immediately returns
201 Created with notification ID
- Service processes notification asynchronously (renders template, calls provider)
- Service updates notification status (SENT, DELIVERED, FAILED, etc.)
Status Checking:
- Client can check delivery status via
GET /notifications/{notificationId}
- Status endpoint returns current delivery status and events
Processing Flow:
- Queue - Notification created with status QUEUED
- Process - Asynchronous processor picks up notification (priority-ordered)
- Render - Template retrieved and rendered with variables
- Send - Provider called (SES, Twilio, etc.)
- Track - Delivery status updated based on provider response/webhooks
Priority-Based Processing:
Notifications are processed in priority order:
- CRITICAL → HIGH → NORMAL → LOW
- Within same priority, FIFO (first-in-first-out)
- Low-priority notifications may be throttled/paused during high load
- ADR-0008: REST vs SQS (synchronous REST API, async-ready design)
- ADR-0011: Stateless JWT Authentication (stateless service design)
- ADR-0010: REST API Design Standards (RESTful endpoint design)
Future Considerations
SQS Integration (Phase 2):
- When SQS is implemented (per ADR-0008), notification queuing can move to SQS
- Service remains fire-and-forget, but queue moves from database to SQS
- Enables better scalability and retry handling
Real-Time Delivery Status:
- Future: WebSocket or Server-Sent Events for real-time delivery status
- Current: Polling via status endpoint