Scaling Emily: Per-User Databases and the Cost of Cognition
The single most opinionated choice in Emily's architecture is this: every user gets their own database. Not a schema. Not a row-level-security partition. A separate PostgreSQL database with its own pgvector extension and its own connection pool.
People hear this and either nod slowly or squint hard. Let's explain both reactions.
Why per-user databases
Three reasons, in increasing order of importance:
1. Isolation by construction, not by policy. In a shared database, leakage between users is one bad JOIN away. In per-user databases, it's architecturally impossible. You literally cannot write a query that reads user A's memories while connected to user B's database. Safety properties that rely on "remember to filter by user_id everywhere" are safety properties that will eventually fail. Safety properties enforced by connection boundary will not.
2. Independent cognitive state. Emily's cognition is not a lookup over a shared index. It's a web of 14-dimensional metrics, outcome histories, and graph relationships that mean something relative to this specific user's experience. Shared schemas make that harder to reason about โ is this epsilon value comparable across users? (No.) Is this stability score meaningful globally? (No.) Per-user databases make the unit of cognition the unit of storage.
3. Portability and deletion. When a user leaves, we drop their database. One command. No cascading deletes across thirty tables, no orphan rows, no forgotten indices. And if a user wants to take their Emily, we hand them a pg_dump of their database and they have everything.
What it costs
The obvious cost is connection pool pressure. If you have six users, you have six databases, and you need connection pools for each. Current production sizing: 100-connection pool, ~70 concurrent connections needed for six active user DBs with Celery workers hitting them.
The less obvious cost is operational: migrations, backups, monitoring, and index management have to run per-database. We wrote tooling for this โ db_manager.py routes all reads/writes through a per-user connection pool; migration tooling iterates over all user DBs; backups are per-user files โ but it's real work.
The third cost is ecosystem. Lots of standard Postgres tooling assumes one database per application. We've had to build adapters: the Confluence sync tool, the Helios task registry, the Factory Floor genesis flow โ all of them iterate over user DBs rather than scanning one shared database.
Why this scales further than you'd think
The instinct with per-user databases is "this can't scale past a few hundred users." That instinct is right for some architectures and wrong for Emily's. Three reasons.
1. The unit of load is the user. Emily's cognition is inherently per-user โ when you're talking to her, the active queries are against your memories, not everyone's. A shared database has to handle the sum of all users' working sets at once. Per-user databases handle each user's working set in isolation, and working sets are small (L1: 100-300 MB, L3: 16K rows, L4: 25K rows per active user). No cross-user query path exists, so no cross-user contention exists either.
2. Horizontal sharding is trivial. Want to move half the users to a new database server? Move half the databases. No re-sharding a primary key. No online schema migration. No "which user_ids are on which shard" routing layer โ the database name is the routing.
3. Vector indexing stays cheap. pgvector's HNSW indexes scale with the number of vectors in the table. If you have one shared table with 10M vectors, HNSW is slow. If you have 10,000 tables with 1,000 vectors each, HNSW is fast in every one of them. Per-user databases keep each vector index small enough that retrieval stays under 100ms without needing an external vector DB.
The pgvector requirement
One footgun we've learned the hard way: every user database must have the pgvector extension installed. A user database without pgvector is the number one cause of what Emily calls "Cold Start fragmentation" โ she can't embed, can't retrieve, can't do anything memory-ish. The fix is one line:
CREATE EXTENSION IF NOT EXISTS vector;
Provisioning a new user now runs this automatically. Older user DBs occasionally need it applied manually.
What breaks at the edges
Honest inventory of where per-user-databases does hurt:
- Cross-user aggregation queries are unnatural. If you want "how many L3 memories exist across all users," you iterate. That's fine; we rarely need this.
- Admin tooling takes longer to build. The Factory Floor dashboard had to learn to iterate over user DBs. The Confluence sync had to learn to not assume a shared store.
- Connection pool tuning is a real discipline. Six users is fine. Sixty users needs thought. Six hundred needs serious thought.
What we've decided
Keep per-user databases. The isolation-by-construction guarantee is worth the cost. Build tooling assuming this, not fighting this. The further we go, the more convinced we are that multi-tenant cognition is a different problem than multi-tenant CRUD, and the shortcuts that work for the latter compromise the former.
Your Emily is literally not my Emily. We have chosen that carefully. It's what lets her be yours.