Performance Tuning Guide

This guide helps you configure async-cache for optimal performance in production.

Choosing maxsize

Workload	Recommended maxsize	Reasoning
API response cache	1,000 – 10,000	Bounded memory, LRU evicts cold keys
ML embeddings	50,000 – 500,000	Large but finite corpus; disk backend for persistence
Session store	Active sessions × 1.2	Size to active user count + headroom
Config / feature flags	100 – 1,000	Small, rarely evicted

Rule of thumb: Start with maxsize = expected_hot_keys * 1.5. Monitor get_metrics()['size'] vs maxsize — if size is consistently at maxsize, increase it.

Choosing TTL

Data type	Recommended TTL	Notes
User profiles	60 – 300s	Balance freshness vs DB load
Product catalog	300 – 3600s	Changes infrequently
Config / flags	60 – 120s	Short TTL for fast rollout
ML inference	3600s – None	Immutable inputs → long/infinite TTL
Session data	Match session timeout	Prevent stale sessions

Batch Loader Tuning

batch_window_ms (default 5): Increase for higher batching ratio at cost of latency. GraphQL resolvers: 5–10ms. Background jobs: 20–50ms.
max_batch_size (default 100): Match your database’s optimal batch size.

Disk Backend Tuning

Save frequency: Call save_to_backend() on shutdown (atexit, signal handler). For critical data, save periodically (e.g., every 5 minutes).
File location: Use fast local storage (SSD). Avoid network mounts.
Cache size: Pickle files scale linearly. 100K entries ≈ 10–100 MB depending on value size.

Monitoring

Expose get_metrics() to your monitoring system:

# Prometheus example
from prometheus_client import Gauge

hit_rate = Gauge('cache_hit_rate', 'Cache hit rate')
cache_size = Gauge('cache_size', 'Cache entries')

async def update_metrics():
    m = cache.get_metrics()
    hit_rate.set(m['hit_rate'])
    cache_size.set(m['size'])

Key metrics to watch:

hit_rate < 0.5: maxsize too small or TTL too short
hit_rate > 0.99: TTL may be too long (stale data risk)
size == maxsize consistently: increase maxsize
misses growing faster than hits: review access patterns

Performance Characteristics

Operation	Complexity	Notes
`get()` (hit)	O(1)	~0.8µs median
`get()` (miss + loader)	O(1) + loader time	~3µs overhead + loader
`set()`	O(1)	Includes LRU eviction if needed
`delete()`	O(1)
Thundering herd	O(1) loader calls	Regardless of concurrent requests
Batch loader	O(1) per batch	Amortized across batch window
`save_to_backend()`	O(n)	n = cache size; runs synchronously
`load_from_backend()`	O(n)	n = persisted entries