Telegram bots produce event-heavy workloads. That changes what a good data model looks like: read paths should stay simple, write amplification should stay predictable, and indexing should reflect message traffic shape.
Most developers start with a data model that works fine for a hundred users and breaks at ten thousand. The failure mode is not dramatic — queries just get slower, timeouts start appearing, and the bot starts feeling unresponsive. This guide covers the patterns that avoid that trajectory.
Separate hot paths from reporting data
Do not overload the same collection for real-time bot lookups and analytical queries. Split them when the query patterns diverge. A users collection that handles both session lookups on every message and aggregation queries for a dashboard will eventually cause problems.
- Index by chatId and userId for all runtime message lookups.
- Keep message logs in an append-only collection separate from user records.
- Move heavier aggregation queries to background-friendly collections.
- Use TTL indexes on session and state documents to auto-expire stale data.
- Avoid storing large arrays on user documents that get fetched on every request.
Document shape for user state
The most common performance problem in bot codebases is a user document that grows unbounded. Order history, message history, event logs — all appended to the same document. MongoDB documents have a 16MB limit, but you will hit performance problems well before that.
// Avoid: everything on the user document
{
_id: userId,
name: 'Alice',
settings: { ... },
orders: [ ...potentially thousands ],
messages: [ ...unbounded ],
}
// Better: keep user document lean, reference related collections
{
_id: userId,
name: 'Alice',
settings: { notifications: true, language: 'en' },
subscriptionStatus: 'active',
createdAt: ISODate('2026-01-10'),
}
// Orders in their own collection, indexed by userId
{
_id: orderId,
userId: userId,
product: 'Pro Plan',
status: 'delivered',
createdAt: ISODate('2026-01-15'),
}Indexing for message throughput
Every query your bot runs on every incoming message should be covered by an index. Even small bots process hundreds of messages per minute during active periods. A collection scan on an unindexed field at that rate will cause visible latency.
// Create these indexes on startup or in a migration
await db.collection('users').createIndex({ chatId: 1 }, { unique: true })
await db.collection('sessions').createIndex({ userId: 1, botId: 1 })
await db.collection('sessions').createIndex(
{ updatedAt: 1 },
{ expireAfterSeconds: 86400 } // TTL: auto-delete after 24h
)
await db.collection('orders').createIndex({ userId: 1, createdAt: -1 })
await db.collection('orders').createIndex({ status: 1, createdAt: -1 })Aggregation patterns for bot dashboards
Many bots expose an admin panel that shows statistics: total users, active subscriptions, revenue per period. These queries should never run on the primary read path. Use MongoDB aggregations scheduled in the background and cache the results in a summary collection.
// Run this aggregation in a background job, not on each request
const summary = await db.collection('orders').aggregate([
{ $match: { status: 'completed', createdAt: { $gte: startOfMonth } } },
{ $group: {
_id: null,
totalRevenue: { $sum: '$amount' },
orderCount: { $sum: 1 },
uniqueUsers: { $addToSet: '$userId' },
}},
{ $project: {
totalRevenue: 1,
orderCount: 1,
uniqueUserCount: { $size: '$uniqueUsers' },
}},
]).toArray()
// Store the result
await db.collection('stats_cache').updateOne(
{ key: 'monthly_summary' },
{ $set: { ...summary[0], updatedAt: new Date() } },
{ upsert: true }
)The rule of thumb: if a query runs on every message, it needs an index. If it runs for a dashboard, it needs to be cached.




