Scaling to millions
Notes for bots in the hundreds of thousands to millions of guilds, run as many cluster processes. This builds on Gateway and sharding and Caching and memory.
Notes for bots in the hundreds of thousands to millions of guilds, run as many cluster processes. This builds on Gateway and sharding and Caching and memory.
The core problem: cross-cluster duplication
Each cluster is a separate Node process with its own heap. A user who is in guilds spread across many clusters is cached once per cluster, so a popular user can sit in many heaps at once. JavaScript heaps cannot be shared between processes, so the library cannot dedupe this on its own. A shared external store (Redis) is the only way to hold one copy fleet-wide.
The catch: Athena's cache is synchronous (gateway dispatch, permission math, and event construction all call .get() and cannot await). So Redis cannot transparently back .get(). The supported pattern keeps a bounded hot set in heap and resolves misses asynchronously.
The pattern
- Cap each cluster's heap with a bounded hot set.
- Write every seen record through to a shared Redis, keyed so all clusters share one copy.
- Resolve on demand with async accessors: hot set, then Redis, then REST.
import Redis from 'ioredis';
import { Client, RedisCacheStore, RemoteBackedCollection, User, Member } from 'athena';
const store = new RedisCacheStore(new Redis(process.env.REDIS_URL), {
prefix: 'zira:', // share the SAME prefix across all clusters to dedupe
ttlSeconds: 86_400, // optional expiry
updateTTLOnGet: true // optional sliding expiration
});
const client = new Client(token, {
cache: {
remoteStore: store,
users: () => new RemoteBackedCollection(User, 50_000, store, 'user'),
members: (guild) => new RemoteBackedCollection(Member, 50_000, store, `member:${guild.id}`)
}
});Then resolve through the tiers instead of synchronous .get():
const user = await client.fetchUser(userID); // hot set -> Redis -> REST
const member = await guild.fetchMember(userID); // hot set -> Redis -> RESTRemoteBackedCollection write-throughs the raw payload on add (fire-and-forget, never blocking dispatch) and reconstructs it on fetch. client.users.get() / guild.members.get() still work but see only the local hot set.
Net effect: per-process heap drops from "everything" to a bounded hot set, with one shared copy in Redis. For a reaction-role workload, the access pattern (a reaction arrives, look up the member, toggle a role) is already async, so awaiting a fetch fits naturally.
The store contract
RedisCacheStore is a thin adapter over a client matching RedisLike (get, set, del, optional expire, mget, scan). Athena pulls in no Redis dependency itself; you pass your own client. It uses plain GET/SET so it works on vanilla Redis (no modules required).
To use a different backend (SQL, Memcached, HTTP), implement RemoteCacheStore:
interface RemoteCacheStore {
getEntity(namespace: string, id: string): Promise<string | null>;
getEntities(namespace: string, ids: string[]): Promise<Array<string | null>>;
getAllEntities(namespace: string): Promise<string[]>;
setEntity(namespace: string, id: string, value: string, ttlSeconds?: number): Promise<void>;
removeEntity(namespace: string, id: string): Promise<void>;
}Env-gated lean mode
Gate the aggressive configuration behind your own env var so only large clusters opt in and every other deployment keeps defaults:
const lean = process.env.ATHENA_LEAN === '1';
const store = lean ? new RedisCacheStore(redis, { prefix: 'zira:' }) : undefined;
new Client(token, {
intents,
cache: lean
? {
remoteStore: store,
users: () => new RemoteBackedCollection(User, 50_000, store!, 'user'),
members: (g) => new RemoteBackedCollection(Member, 50_000, store!, `member:${g.id}`),
voiceStates: () => new NullCollection(VoiceState),
stageInstances: () => new NullCollection(StageInstance)
}
: undefined
});Operational guidance
- Size the hot set so active guilds stay resident; cold lookups hit Redis or REST.
- Share one Redis key prefix across the whole fleet so clusters deduplicate.
- Disable caches you never read with
NullCollection(oftenvoiceStates,stageInstances,threadsfor a reaction-role bot). - Keep
compresson and usedisableEventsto cut gateway parsing. - Member write-through currently happens on add (first sighting). If you need every role mutation reflected in Redis immediately, also write through on update in your handler, or call
store.setEntityafter edits.
Tuning checklist
maxShardsand cluster layout sized for your guild count.messageLimitlow (reaction-role bots rarely need message history).GuildPresencesoff unless required.- Hot-set sizes tuned per cluster from real memory numbers.