Scaling to millions

Notes for bots in the hundreds of thousands to millions of guilds, run as many cluster processes. This builds on Gateway and sharding and Caching and memory.

Notes for bots in the hundreds of thousands to millions of guilds, run as many cluster processes. This builds on Gateway and sharding and Caching and memory.

The core problem: cross-cluster duplication

Each cluster is a separate Node process with its own heap. A user who is in guilds spread across many clusters is cached once per cluster, so a popular user can sit in many heaps at once. JavaScript heaps cannot be shared between processes, so the library cannot dedupe this on its own. A shared external store (Redis) is the only way to hold one copy fleet-wide.

The catch: Athena's cache is synchronous (gateway dispatch, permission math, and event construction all call .get() and cannot await). So Redis cannot transparently back .get(). The supported pattern keeps a bounded hot set in heap and resolves misses asynchronously.

The pattern

  1. Cap each cluster's heap with a bounded hot set.
  2. Write every seen record through to a shared Redis, keyed so all clusters share one copy.
  3. Resolve on demand with async accessors: hot set, then Redis, then REST.
import Redis from 'ioredis';
import { Client, RedisCacheStore, RemoteBackedCollection, User, Member } from 'athena';
 
const store = new RedisCacheStore(new Redis(process.env.REDIS_URL), {
  prefix: 'zira:',     // share the SAME prefix across all clusters to dedupe
  ttlSeconds: 86_400,  // optional expiry
  updateTTLOnGet: true // optional sliding expiration
});
 
const client = new Client(token, {
  cache: {
    remoteStore: store,
    users: () => new RemoteBackedCollection(User, 50_000, store, 'user'),
    members: (guild) => new RemoteBackedCollection(Member, 50_000, store, `member:${guild.id}`)
  }
});

Then resolve through the tiers instead of synchronous .get():

const user = await client.fetchUser(userID);     // hot set -> Redis -> REST
const member = await guild.fetchMember(userID);   // hot set -> Redis -> REST

RemoteBackedCollection write-throughs the raw payload on add (fire-and-forget, never blocking dispatch) and reconstructs it on fetch. client.users.get() / guild.members.get() still work but see only the local hot set.

Net effect: per-process heap drops from "everything" to a bounded hot set, with one shared copy in Redis. For a reaction-role workload, the access pattern (a reaction arrives, look up the member, toggle a role) is already async, so awaiting a fetch fits naturally.

The store contract

RedisCacheStore is a thin adapter over a client matching RedisLike (get, set, del, optional expire, mget, scan). Athena pulls in no Redis dependency itself; you pass your own client. It uses plain GET/SET so it works on vanilla Redis (no modules required).

To use a different backend (SQL, Memcached, HTTP), implement RemoteCacheStore:

interface RemoteCacheStore {
  getEntity(namespace: string, id: string): Promise<string | null>;
  getEntities(namespace: string, ids: string[]): Promise<Array<string | null>>;
  getAllEntities(namespace: string): Promise<string[]>;
  setEntity(namespace: string, id: string, value: string, ttlSeconds?: number): Promise<void>;
  removeEntity(namespace: string, id: string): Promise<void>;
}

Env-gated lean mode

Gate the aggressive configuration behind your own env var so only large clusters opt in and every other deployment keeps defaults:

const lean = process.env.ATHENA_LEAN === '1';
const store = lean ? new RedisCacheStore(redis, { prefix: 'zira:' }) : undefined;
 
new Client(token, {
  intents,
  cache: lean
    ? {
        remoteStore: store,
        users: () => new RemoteBackedCollection(User, 50_000, store!, 'user'),
        members: (g) => new RemoteBackedCollection(Member, 50_000, store!, `member:${g.id}`),
        voiceStates: () => new NullCollection(VoiceState),
        stageInstances: () => new NullCollection(StageInstance)
      }
    : undefined
});

Operational guidance

  • Size the hot set so active guilds stay resident; cold lookups hit Redis or REST.
  • Share one Redis key prefix across the whole fleet so clusters deduplicate.
  • Disable caches you never read with NullCollection (often voiceStates, stageInstances, threads for a reaction-role bot).
  • Keep compress on and use disableEvents to cut gateway parsing.
  • Member write-through currently happens on add (first sighting). If you need every role mutation reflected in Redis immediately, also write through on update in your handler, or call store.setEntity after edits.

Tuning checklist

  • maxShards and cluster layout sized for your guild count.
  • messageLimit low (reaction-role bots rarely need message history).
  • GuildPresences off unless required.
  • Hot-set sizes tuned per cluster from real memory numbers.