- Add backfillAggregatesChunk mutation that processes 500 records at a time - Uses pagination and ctx.scheduler.runAfter to chain batch processing - Prevents Convex 16MB memory limit issues with large datasets - Progress visible in Convex dashboard logs - Track seen session IDs across chunks for unique visitor counting - Update howstatsworks.md with chunked backfilling documentation - Add v1.11.1 changelog entries
15 KiB
How stats work
This document explains the real-time analytics system for the markdown site.
Overview
The stats page at /stats shows live visitor data and page view counts. All stats update automatically via Convex subscriptions. No page refresh required.
Aggregate component (v1.15+)
Starting with v1.15, the stats system uses the @convex-dev/aggregate component for efficient O(log n) counts instead of O(n) table scans. This provides significant performance improvements as the page views table grows.
Before (O(n) approach)
The old implementation collected all page views and iterated through them to calculate counts:
// Old approach: O(n) full table scan
const allViews = await ctx.db
.query("pageViews")
.withIndex("by_timestamp")
.order("asc")
.collect();
// Manual aggregation by iterating through all documents
const viewsByPath: Record<string, number> = {};
const uniqueSessions = new Set<string>();
for (const view of allViews) {
viewsByPath[view.path] = (viewsByPath[view.path] || 0) + 1;
uniqueSessions.add(view.sessionId);
}
return {
totalPageViews: allViews.length,
uniqueVisitors: uniqueSessions.size,
};
Problems with this approach:
- Query time grows linearly with page view count
- Memory usage increases with table size
- Full table read on every stats query
- Slower response times as data grows
After (O(log n) with aggregate component)
The new implementation uses the Convex aggregate component for denormalized counts:
// New approach: O(log n) using aggregate component
const totalPageViewsCount = await totalPageViews.count(ctx);
const uniqueVisitorsCount = await uniqueVisitors.count(ctx);
const viewsPerPath = await pageViewsByPath.count(ctx, { namespace: path });
Benefits of the aggregate approach:
- O(log n) count operations regardless of table size
- Counts are pre-computed and maintained incrementally
- Minimal memory usage per query
- Consistent fast response times at any scale
Aggregate definitions
Three TableAggregate instances track different metrics:
// Total page views count (global count)
const totalPageViews = new TableAggregate<{
Key: null;
DataModel: DataModel;
TableName: "pageViews";
}>(components.totalPageViews, {
sortKey: () => null,
});
// Views by path (namespace per path for per-page counts)
const pageViewsByPath = new TableAggregate<{
Namespace: string;
Key: number;
DataModel: DataModel;
TableName: "pageViews";
}>(components.pageViewsByPath, {
namespace: (doc) => doc.path,
sortKey: (doc) => doc.timestamp,
});
// Unique visitors (sessionId as key for distinct count)
const uniqueVisitors = new TableAggregate<{
Key: string;
DataModel: DataModel;
TableName: "pageViews";
}>(components.uniqueVisitors, {
sortKey: (doc) => doc.sessionId,
});
Backfill existing data
After deploying the aggregate component, run the backfill mutation to populate counts from existing page views:
npx convex run stats:backfillAggregates
Chunked backfilling: The backfill process handles large datasets by processing records in batches of 500. This prevents memory limit issues (Convex has a 16MB limit per function execution). The mutation schedules itself to continue processing until all records are backfilled.
How it works:
backfillAggregatesstarts the process and schedules the first chunkbackfillAggregatesChunkprocesses 500 records at a time using pagination- If more records exist, it schedules itself to continue with the next batch
- Progress is logged (check Convex dashboard logs)
- Completes when all records are processed
This is idempotent and safe to run multiple times. It uses insertIfDoesNotExist to avoid duplicates.
Fallback behavior: While aggregates are being backfilled (or if backfilling hasn't run yet), the getStats query uses direct counting from the pageViews table to ensure accurate stats are always displayed. This is slightly slower but guarantees correct numbers.
Data flow
- Visitor loads any page
usePageTrackinghook fires on mount and path change- Page view event recorded to
pageViewstable - Session heartbeat sent to
activeSessionstable - Stats page queries both tables and displays results
Database tables
pageViews
Stores individual view events. Uses the event records pattern to avoid write conflicts.
| Field | Type | Purpose |
|---|---|---|
| path | string | URL path visited |
| pageType | string | "home", "blog", "page", or "stats" |
| sessionId | string | Anonymous UUID per browser |
| timestamp | number | Unix timestamp of visit |
Indexes:
by_pathfor filtering by pageby_timestampfor orderingby_session_pathfor deduplication
activeSessions
Tracks who is currently on the site. Sessions expire after 2 minutes without a heartbeat.
| Field | Type | Purpose |
|---|---|---|
| sessionId | string | Anonymous UUID per browser |
| currentPath | string | Page visitor is currently viewing |
| lastSeen | number | Last heartbeat timestamp |
Indexes:
by_sessionIdfor upsertsby_lastSeenfor cleanup queries
Frontend tracking
The usePageTracking hook in src/hooks/usePageTracking.ts handles all client-side tracking.
Session ID generation
Each browser gets a persistent UUID stored in localStorage. No cookies, no PII.
const SESSION_ID_KEY = "markdown_blog_session_id";
function getSessionId(): string {
let sessionId = localStorage.getItem(SESSION_ID_KEY);
if (!sessionId) {
sessionId = generateSessionId();
localStorage.setItem(SESSION_ID_KEY, sessionId);
}
return sessionId;
}
Page view recording
Records a view when the URL path changes. Deduplication happens server-side.
useEffect(() => {
const path = location.pathname;
if (lastRecordedPath.current !== path) {
lastRecordedPath.current = path;
recordPageView({ path, pageType: getPageType(path), sessionId });
}
}, [location.pathname, recordPageView]);
Heartbeat system
Sends a ping every 30 seconds while the page is open. This powers the "Active Now" count.
Uses refs to prevent duplicate calls and avoid write conflicts:
const HEARTBEAT_INTERVAL_MS = 30 * 1000;
const HEARTBEAT_DEBOUNCE_MS = 5 * 1000;
// Track heartbeat state to prevent duplicate calls
const isHeartbeatPending = useRef(false);
const lastHeartbeatTime = useRef(0);
const lastHeartbeatPath = useRef<string | null>(null);
const sendHeartbeat = useCallback(
async (path: string) => {
const sessionId = sessionIdRef.current;
if (!sessionId) return;
const now = Date.now();
// Skip if heartbeat is already pending
if (isHeartbeatPending.current) {
return;
}
// Skip if same path and sent recently (debounce)
if (
lastHeartbeatPath.current === path &&
now - lastHeartbeatTime.current < HEARTBEAT_DEBOUNCE_MS
) {
return;
}
isHeartbeatPending.current = true;
lastHeartbeatTime.current = now;
lastHeartbeatPath.current = path;
try {
await heartbeatMutation({ sessionId, currentPath: path });
} catch {
// Silently fail
} finally {
isHeartbeatPending.current = false;
}
},
[heartbeatMutation],
);
useEffect(() => {
const path = location.pathname;
sendHeartbeat(path);
const intervalId = setInterval(() => {
sendHeartbeat(path);
}, HEARTBEAT_INTERVAL_MS);
return () => clearInterval(intervalId);
}, [location.pathname, sendHeartbeat]);
Backend mutations
recordPageView
Located in convex/stats.ts. Records view events with deduplication and updates aggregate components.
Deduplication window: 30 minutes. Same session viewing same path within 30 minutes counts as 1 view.
export const recordPageView = mutation({
args: {
path: v.string(),
pageType: v.string(),
sessionId: v.string(),
},
returns: v.null(),
handler: async (ctx, args) => {
const dedupCutoff = Date.now() - DEDUP_WINDOW_MS;
const recentView = await ctx.db
.query("pageViews")
.withIndex("by_session_path", (q) =>
q.eq("sessionId", args.sessionId).eq("path", args.path)
)
.order("desc")
.first();
if (recentView && recentView.timestamp > dedupCutoff) {
return null;
}
// Check if this is a new unique visitor
const existingSessionView = await ctx.db
.query("pageViews")
.withIndex("by_session_path", (q) => q.eq("sessionId", args.sessionId))
.first();
const isNewVisitor = !existingSessionView;
// Insert new view event
const id = await ctx.db.insert("pageViews", {
path: args.path,
pageType: args.pageType,
sessionId: args.sessionId,
timestamp: Date.now(),
});
const doc = await ctx.db.get(id);
// Update aggregate components for O(log n) counts
if (doc) {
await pageViewsByPath.insertIfDoesNotExist(ctx, doc);
await totalPageViews.insertIfDoesNotExist(ctx, doc);
if (isNewVisitor) {
await uniqueVisitors.insertIfDoesNotExist(ctx, doc);
}
}
return null;
},
});
heartbeat
Creates or updates a session record. Uses indexed lookup for upsert with a 10-second dedup window to prevent write conflicts.
const HEARTBEAT_DEDUP_MS = 10 * 1000;
export const heartbeat = mutation({
args: {
sessionId: v.string(),
currentPath: v.string(),
},
returns: v.null(),
handler: async (ctx, args) => {
const now = Date.now();
const existingSession = await ctx.db
.query("activeSessions")
.withIndex("by_sessionId", (q) => q.eq("sessionId", args.sessionId))
.first();
if (existingSession) {
// Early return if same path and recently updated (idempotent)
if (
existingSession.currentPath === args.currentPath &&
now - existingSession.lastSeen < HEARTBEAT_DEDUP_MS
) {
return null;
}
await ctx.db.patch(existingSession._id, {
currentPath: args.currentPath,
lastSeen: now,
});
return null;
}
await ctx.db.insert("activeSessions", {
sessionId: args.sessionId,
currentPath: args.currentPath,
lastSeen: now,
});
return null;
},
});
Backend query
getStats
Returns all stats for the /stats page. Single query, real-time subscription. Uses aggregate components for O(log n) counts instead of O(n) table scans.
What it returns:
| Field | Type | Description |
|---|---|---|
| activeVisitors | number | Sessions with heartbeat in last 2 minutes |
| activeByPath | array | Breakdown of active visitors by current page |
| totalPageViews | number | All recorded views since tracking started (via aggregate) |
| uniqueVisitors | number | Count of distinct session IDs (via aggregate) |
| publishedPosts | number | Blog posts with published: true |
| publishedPages | number | Static pages with published: true |
| trackingSince | number or null | Timestamp of earliest view event |
| pageStats | array | Views per page with title and type (per-path aggregate counts) |
Aggregate usage in getStats
// O(log n) counts using aggregate component
const totalPageViewsCount = await totalPageViews.count(ctx);
const uniqueVisitorsCount = await uniqueVisitors.count(ctx);
// Per-path counts using namespace
const views = await pageViewsByPath.count(ctx, { namespace: path });
Title matching
The query matches URL paths to post/page titles by slug:
const slug = path.startsWith("/") ? path.slice(1) : path;
const post = posts.find((p) => p.slug === slug);
const page = pages.find((p) => p.slug === slug);
if (post) {
title = post.title;
pageType = "blog";
} else if (page) {
title = page.title;
pageType = "page";
}
Cleanup cron
Stale sessions are cleaned up every 5 minutes via cron job in convex/crons.ts.
crons.interval(
"cleanup stale sessions",
{ minutes: 5 },
internal.stats.cleanupStaleSessions,
{}
);
The cleanup mutation deletes sessions older than 2 minutes:
const cutoff = Date.now() - SESSION_TIMEOUT_MS;
const staleSessions = await ctx.db
.query("activeSessions")
.withIndex("by_lastSeen", (q) => q.lt("lastSeen", cutoff))
.collect();
await Promise.all(staleSessions.map((s) => ctx.db.delete(s._id)));
How new content appears in stats
When you add a new markdown post or page and sync it to Convex:
-
Post/page counts update instantly. The
publishedPostsandpublishedPagesvalues come from live queries to thepostsandpagestables. -
Views appear after first visit. A page only shows in "Views by Page" after someone visits it.
-
Titles resolve automatically. The
getStatsquery matches paths to slugs, so new content gets its proper title displayed.
No manual configuration required. Sync content, and stats track it.
Privacy
- No cookies
- No PII stored
- Session IDs are random UUIDs
- No IP addresses logged
- No fingerprinting
- Data stays in your Convex deployment
Configuration constants
| Constant | Value | Location | Purpose |
|---|---|---|---|
| DEDUP_WINDOW_MS | 30 minutes | convex/stats.ts | Page view deduplication |
| SESSION_TIMEOUT_MS | 2 minutes | convex/stats.ts | Active session expiry |
| HEARTBEAT_DEDUP_MS | 10 seconds | convex/stats.ts | Backend idempotency window |
| HEARTBEAT_INTERVAL_MS | 30 seconds | src/hooks/usePageTracking.ts | Client heartbeat frequency |
| HEARTBEAT_DEBOUNCE_MS | 5 seconds | src/hooks/usePageTracking.ts | Frontend debounce window |
Files involved
| File | Purpose |
|---|---|
convex/stats.ts |
All stats mutations, queries, and aggregate definitions |
convex/convex.config.ts |
Aggregate component registration (pageViewsByPath, totalPageViews, uniqueVisitors) |
convex/schema.ts |
Table definitions for pageViews and activeSessions |
convex/crons.ts |
Scheduled cleanup job |
src/hooks/usePageTracking.ts |
Client-side tracking hook |
src/pages/Stats.tsx |
Stats page UI |
Write conflict prevention
The stats system uses several patterns to avoid write conflicts in the activeSessions table:
Backend (convex/stats.ts):
- 10-second dedup window: skips updates if session was recently updated with same path
- Indexed queries: uses
by_sessionIdindex for efficient lookups - Early returns: mutation is idempotent and safe to call multiple times
Frontend (src/hooks/usePageTracking.ts):
- 5-second debounce: prevents rapid duplicate calls from the same tab
- Pending state ref: blocks overlapping async calls
- Path tracking ref: skips redundant heartbeats for same path
See prds/howtoavoidwriteconflicts.md for the full implementation details.