Performance Tuning Guide
========================

This guide helps you configure async-cache for optimal performance in production.

Choosing maxsize
----------------

.. list-table::
   :header-rows: 1
   :widths: 25 25 50

   * - Workload
     - Recommended maxsize
     - Reasoning
   * - API response cache
     - 1,000 – 10,000
     - Bounded memory, LRU evicts cold keys
   * - ML embeddings
     - 50,000 – 500,000
     - Large but finite corpus; disk backend for persistence
   * - Session store
     - Active sessions × 1.2
     - Size to active user count + headroom
   * - Config / feature flags
     - 100 – 1,000
     - Small, rarely evicted

**Rule of thumb**: Start with ``maxsize = expected_hot_keys * 1.5``. Monitor
``get_metrics()['size']`` vs ``maxsize`` — if size is consistently at maxsize,
increase it.

Choosing TTL
------------

.. list-table::
   :header-rows: 1
   :widths: 25 25 50

   * - Data type
     - Recommended TTL
     - Notes
   * - User profiles
     - 60 – 300s
     - Balance freshness vs DB load
   * - Product catalog
     - 300 – 3600s
     - Changes infrequently
   * - Config / flags
     - 60 – 120s
     - Short TTL for fast rollout
   * - ML inference
     - 3600s – None
     - Immutable inputs → long/infinite TTL
   * - Session data
     - Match session timeout
     - Prevent stale sessions

Batch Loader Tuning
-------------------

- **batch_window_ms** (default 5): Increase for higher batching ratio at cost of latency.
  GraphQL resolvers: 5–10ms. Background jobs: 20–50ms.
- **max_batch_size** (default 100): Match your database's optimal batch size.

Disk Backend Tuning
-------------------

- **Save frequency**: Call ``save_to_backend()`` on shutdown (``atexit``, signal handler).
  For critical data, save periodically (e.g., every 5 minutes).
- **File location**: Use fast local storage (SSD). Avoid network mounts.
- **Cache size**: Pickle files scale linearly. 100K entries ≈ 10–100 MB depending on value size.

Monitoring
----------

Expose ``get_metrics()`` to your monitoring system:

.. code-block:: python

    # Prometheus example
    from prometheus_client import Gauge

    hit_rate = Gauge('cache_hit_rate', 'Cache hit rate')
    cache_size = Gauge('cache_size', 'Cache entries')

    async def update_metrics():
        m = cache.get_metrics()
        hit_rate.set(m['hit_rate'])
        cache_size.set(m['size'])

**Key metrics to watch**:

- **hit_rate < 0.5**: maxsize too small or TTL too short
- **hit_rate > 0.99**: TTL may be too long (stale data risk)
- **size == maxsize consistently**: increase maxsize
- **misses growing faster than hits**: review access patterns

Performance Characteristics
---------------------------

.. list-table::
   :header-rows: 1

   * - Operation
     - Complexity
     - Notes
   * - ``get()`` (hit)
     - O(1)
     - ~0.8µs median
   * - ``get()`` (miss + loader)
     - O(1) + loader time
     - ~3µs overhead + loader
   * - ``set()``
     - O(1)
     - Includes LRU eviction if needed
   * - ``delete()``
     - O(1)
     -
   * - Thundering herd
     - O(1) loader calls
     - Regardless of concurrent requests
   * - Batch loader
     - O(1) per batch
     - Amortized across batch window
   * - ``save_to_backend()``
     - O(n)
     - n = cache size; runs synchronously
   * - ``load_from_backend()``
     - O(n)
     - n = persisted entries