Performance Tuning Guide

This guide helps you configure async-cache for optimal performance in production.

Choosing maxsize

Workload

Recommended maxsize

Reasoning

API response cache

1,000 – 10,000

Bounded memory, LRU evicts cold keys

ML embeddings

50,000 – 500,000

Large but finite corpus; disk backend for persistence

Session store

Active sessions × 1.2

Size to active user count + headroom

Config / feature flags

100 – 1,000

Small, rarely evicted

Rule of thumb: Start with maxsize = expected_hot_keys * 1.5. Monitor get_metrics()['size'] vs maxsize — if size is consistently at maxsize, increase it.

Choosing TTL

Data type

Recommended TTL

Notes

User profiles

60 – 300s

Balance freshness vs DB load

Product catalog

300 – 3600s

Changes infrequently

Config / flags

60 – 120s

Short TTL for fast rollout

ML inference

3600s – None

Immutable inputs → long/infinite TTL

Session data

Match session timeout

Prevent stale sessions

Batch Loader Tuning

  • batch_window_ms (default 5): Increase for higher batching ratio at cost of latency. GraphQL resolvers: 5–10ms. Background jobs: 20–50ms.

  • max_batch_size (default 100): Match your database’s optimal batch size.

Disk Backend Tuning

  • Save frequency: Call save_to_backend() on shutdown (atexit, signal handler). For critical data, save periodically (e.g., every 5 minutes).

  • File location: Use fast local storage (SSD). Avoid network mounts.

  • Cache size: Pickle files scale linearly. 100K entries ≈ 10–100 MB depending on value size.

Monitoring

Expose get_metrics() to your monitoring system:

# Prometheus example
from prometheus_client import Gauge

hit_rate = Gauge('cache_hit_rate', 'Cache hit rate')
cache_size = Gauge('cache_size', 'Cache entries')

async def update_metrics():
    m = cache.get_metrics()
    hit_rate.set(m['hit_rate'])
    cache_size.set(m['size'])

Key metrics to watch:

  • hit_rate < 0.5: maxsize too small or TTL too short

  • hit_rate > 0.99: TTL may be too long (stale data risk)

  • size == maxsize consistently: increase maxsize

  • misses growing faster than hits: review access patterns

Performance Characteristics

Operation

Complexity

Notes

get() (hit)

O(1)

~0.8µs median

get() (miss + loader)

O(1) + loader time

~3µs overhead + loader

set()

O(1)

Includes LRU eviction if needed

delete()

O(1)

Thundering herd

O(1) loader calls

Regardless of concurrent requests

Batch loader

O(1) per batch

Amortized across batch window

save_to_backend()

O(n)

n = cache size; runs synchronously

load_from_backend()

O(n)

n = persisted entries