0014. Application Caching Strategy
Status: Accepted
Date: 2025-01-27
Context: Application-level caching to improve performance and cut load and cost across single-instance dev and multi-instance production.
Context
The Forge platform performs several expensive operations on every request or frequently accessed data:
- Cognito Token Validation - JWT parsing and signature validation via Cognito JWKS on every authenticated request
- User Profile Lookups - PostgreSQL queries for candidate profiles on profile page loads and header username display
- Document Retrieval - DynamoDB reads for parsed documents
Current state shows no caching layer, resulting in:
- High database query volume (PostgreSQL and DynamoDB)
- Repeated Cognito JWKS lookups and JWT parsing overhead
- Increased latency for frequently accessed data
- Higher operational costs (DynamoDB read capacity units, database connection overhead)
We must choose between:
- No caching - Continue with direct database/external service calls
- Local caching only - In-memory cache (Caffeine) for single-instance deployments
- Distributed caching - Redis/ElastiCache for multi-instance deployments
- Hybrid approach - Start with local caching, migrate to distributed for production
Additionally, we need to consider integration with planned distributed rate limiting infrastructure (Redis backend).
Decision
Implement a three-phase caching strategy using Quarkus Cache:
- Phase 1: Local Caching (Caffeine) - Use Quarkus Cache with default Caffeine backend for development and single-instance deployments
- Phase 2: Metrics and Monitoring - Add cache metrics (hit/miss rates) to Grafana dashboards before production migration
- Phase 3: Redis Backend (Production) - Migrate to
quarkus-cache-redis with AWS ElastiCache for distributed caching across multiple service instances
Primary Use Cases:
- Cache Cognito token validation results (TTL = token expiration)
- Cache candidate profile data (TTL = 10 minutes, invalidate on updates)
- Cache parsed document results (TTL = 1 hour, documents are idempotent)
Cache Implementation:
- Use Quarkus Cache annotations (
@CacheResult, @CacheInvalidate)
- Cache-aside pattern (invalidate on writes, populate on reads)
- Fail-open strategy (cache failures don’t break requests)
Infrastructure:
- Phase 1: Local Caffeine cache (in-memory, per-instance)
- Phase 3: Shared ElastiCache Redis cluster for caching + distributed rate limiting
Rationale
Why Quarkus Cache
- Framework Integration - Native Quarkus support with CDI annotations
- Multiple Backends - Supports Caffeine (local) and Redis (distributed) via configuration
- Simple API -
@CacheResult and @CacheInvalidate annotations reduce boilerplate
- Metrics - Automatic Micrometer integration for cache metrics
- Production Ready - Battle-tested in Quarkus ecosystem
Why Three-Phase Approach
- Incremental Risk - Start simple (local cache), add metrics, then migrate to distributed
- Development First - Local caching works immediately without infrastructure dependencies
- Metrics Before Production - Understand cache performance before distributed migration
- Production Hardening - Redis backend provides consistency across instances
Why Cache-Aside Pattern
- Simplicity - Clear separation: reads populate cache, writes invalidate cache
- Consistency - Database remains source of truth
- Flexibility - Easy to add/remove caching without changing business logic
Why Shared Redis Infrastructure
- Cost Optimization - Single ElastiCache cluster serves caching + rate limiting
- Operational Simplicity - One infrastructure component to manage
- Synergy - Future distributed rate limiting (tracked on the private product roadmap in the main repository)
Why These Use Cases
- Token Validation - Highest frequency (every authenticated request), high cryptographic overhead
- Candidate Profiles - Frequently accessed, low change rate, database query overhead
- Parsed Documents - Idempotent (safe to cache), DynamoDB read cost reduction
Architecture Overview
Phase 1: Local Caching (Caffeine)
┌─────────────────┐
│ Service │
│ Instance │
│ │
│ ┌───────────┐ │
│ │ Caffeine │ │
│ │ Cache │ │
│ └───────────┘ │
│ │ │
└───────┼─────────┘
│
├──> PostgreSQL (candidate profiles)
├──> DynamoDB (parsed documents)
└──> Cognito (token validation)
Phase 3: Distributed Caching (Redis)
┌─────────────────┐ ┌─────────────────┐
│ Service │ │ Service │
│ Instance 1 │ │ Instance 2 │
│ │ │ │
│ ┌───────────┐ │ │ ┌───────────┐ │
│ │ Quarkus │ │ │ │ Quarkus │ │
│ │ Cache │ │ │ │ Cache │ │
│ └─────┬─────┘ │ │ └─────┬─────┘ │
└────────┼────────┘ └────────┼────────┘
│ │
└────────────┬──────────┘
│
┌───────▼───────┐
│ ElastiCache │
│ Redis Cluster │
│ │
│ cache:* │
│ ratelimit:* │
└───────┬───────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
PostgreSQL DynamoDB Cognito
Implementation Details
Cache Configuration
Token Validation Cache:
- Cache name:
token-validation
- Key: Composite key: hashed token reference plus expiry (so entries line up with JWT validity windows;
never use the raw bearer token as all or part of the key).
- TTL: Token expiration time (automatic)
- Max size: 10,000 entries
Candidate Profile Cache:
- Cache name:
candidate-profiles
- Key: Composite key: profile namespace plus stable candidate identifier
- TTL: 10 minutes
- Max size: 5,000 entries
- Invalidation: On profile registration/updates
Document Cache:
- Cache name:
parsed-resumes, parsed-jobspecs
- Key: Composite key: document-type namespace plus candidate identifier or job-spec transaction identifier, depending on which artifact is cached
- TTL: 1 hour
- Max size: 10,000 entries
- Invalidation: On document re-upload
Error Handling
- Fail-Open Strategy - Cache failures don’t break requests
- Negative Caching - Don’t cache null/empty results (except token validation)
- Exception Handling - Quarkus Cache handles exceptions gracefully, falls back to underlying operation
Security Considerations
- Token Caching - Cache keys use token hashes (not full tokens)
- PII in Cache - Ensure Redis encryption at rest and in transit
- Key Namespacing - Use prefixes (
cache:token:, cache:candidate:) to prevent collisions
Consequences
Positive
- Performance - Reduced latency for cached operations (20-50ms improvement)
- Cost Reduction - 20-30% reduction in DynamoDB read capacity, 30-40% reduction in PostgreSQL queries
- Scalability - Redis backend enables horizontal scaling with shared cache
- Operational Efficiency - Shared Redis infrastructure for caching + rate limiting
- Developer Experience - Simple annotations, no boilerplate cache management code
Negative
- Complexity - Additional infrastructure component (Redis) in production
- Cache Invalidation - Must ensure cache invalidation on writes (cache-aside pattern)
- Memory Usage - Local cache (Caffeine) consumes JVM heap memory
- Cache Stampede - Risk of thundering herd if cache expires simultaneously (mitigated by TTL variance)
Risks and Mitigations
- Cache Inconsistency - Mitigated by cache-aside pattern (database is source of truth)
- Cache Failures - Mitigated by fail-open strategy (requests continue without cache)
- Memory Pressure - Mitigated by size limits and TTL-based eviction
- Redis Availability - Mitigated by ElastiCache high availability configuration
Success Metrics
- Cache hit rate > 80% for token validation
- Cache hit rate > 60% for candidate profiles
- Cache hit rate > 70% for documents
- P95 latency reduction of 20-50ms for cached operations
- 20-30% reduction in DynamoDB read capacity units
- 30-40% reduction in PostgreSQL query volume
References