Design Walkthrough
Problem Statement
The Question: Design a document editing app like Google Docs or Notion where many people can type in the same document at the same time.
What the app needs to do (most important first):
- 1.Real-time editing - When Alice types something, Bob should see it appear on his screen right away (within half a second).
- 2.Handle conflicts - If Alice and Bob both type at the same spot at the same time, keep BOTH of their changes. Do not throw away anyone's work.
- 3.Show who is editing - Show colored cursors so you can see where other people are typing.
- 4.Work offline - If your internet goes out, you can keep typing. When internet comes back, your changes sync up.
- 5.Version history - Save every change so users can go back in time and see what the document looked like yesterday.
- 6.Support formatting - Let users make text bold, italic, add headers, bullet points, etc.
What to say first
Before I start designing, I need to understand the conflict resolution requirements. The key question is: when two users type at the same position at the same time, what should happen? This single question determines our entire architecture.
What the interviewer really wants to see: - Do you understand why simple approaches (like just saving the last person's changes) do not work? - Can you explain how OT or CRDTs solve the conflict problem? - Do you know the difference between OT (needs a server) and CRDT (works without server)? - How do you handle someone who was offline for an hour and then reconnects?
Clarifying Questions
Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.
Question 1: How many people edit at once?
How many people will be editing the same document at the same time? Is it usually 2-3 friends, or could it be 1000 people in a company meeting?
Why ask this: If only 2-3 people edit together, the design is simpler. If 1000 people edit the same document, we need special handling.
What interviewers usually say: Most documents have 2-5 editors. But some popular documents (like company announcements) might have 100+ people editing.
How this changes your design: We design for the common case (2-5 editors) but need to handle the rare case (100+ editors) without crashing.
Question 2: Plain text or fancy formatting?
Is this plain text like Notepad, or rich text with bold, italic, headers, and bullet points like Google Docs?
Why ask this: Plain text is much simpler - you just track character positions. Rich text needs to track formatting ranges (which characters are bold).
What interviewers usually say: Rich text with formatting, like Google Docs.
How this changes your design: We need to handle formatting conflicts too. What if Alice makes text bold while Bob deletes it?
Question 3: Do we need offline support?
Should people be able to edit when they have no internet? If yes, what happens when they reconnect?
Why ask this: Offline support is hard! The user might edit for an hour offline while others are editing online. When they reconnect, we need to merge everything.
What interviewers usually say: Yes, mobile users need offline support.
How this changes your design: Offline support pushes us toward CRDTs (which merge automatically) instead of OT (which needs a server to decide).
Question 4: Do we need version history?
Should users be able to see who changed what and when? Can they go back to an old version?
Why ask this: If we need full history, we must save every single keystroke forever. That is a lot of storage.
What interviewers usually say: Yes, like Google Docs version history.
How this changes your design: We save operations (not just final documents), so we can replay history.
Summarize your assumptions
Let me summarize what I will design for: Rich text documents, usually 2-5 concurrent editors but up to 100 on popular docs, offline support required, full version history needed. Given the offline requirement, I will lean toward a CRDT-based design.
The Hard Part
Say this to the interviewer
The hardest part of collaborative editing is handling conflicts. Imagine two people type at the exact same position at the exact same moment. We cannot just pick one and lose the other person's work. We need a smart algorithm to keep both changes.
Let me show you the problem with a simple example:
Document says: "Hello"
Alice and Bob are both editing at the same time: - Alice adds " World" at position 5 -> She sees "Hello World" - Bob adds " Earth" at position 5 -> He sees "Hello Earth"
Both edits happen at the exact same time. Now what?
Bad outcomes we must avoid: - Pick Alice, lose Bob's work -> Bob typed " Earth" and it disappeared! He will be angry. - Pick Bob, lose Alice's work -> Same problem for Alice. - Jumble them together -> "Hello WEoarrtlhd" - nonsense!
Good outcome we want: - Keep both: "Hello World Earth" or "Hello Earth World" - both people's work is saved.
Common mistake candidates make
Many people say: just use timestamps - whoever typed last wins! This is wrong because: (1) it throws away the other person's work, (2) clocks on different computers are not perfectly synced, (3) users will rage-quit when their typing disappears.
Why simple approaches do not work:
1. Locking (only one person can edit at a time) - Problem: That is not real-time collaboration! Users see "Someone else is editing, please wait." - frustrating.
2. Last-write-wins (whoever saved last, their version is kept) - Problem: Your friend's edits silently disappear. They type a whole paragraph, poof - gone!
3. Show merge conflicts (like Git) - Problem: Normal users do not understand merge conflicts. "What do you mean HEAD vs origin/main??"
Two solutions that actually work:
Option 1: Operational Transformation (OT) - Used by Google Docs - Idea: When you get someone else's edit, adjust the position based on what happened before it. - Example: Bob inserted " Earth" at position 5. But wait, Alice already inserted 6 characters. So Bob's insert should now go at position 11 instead. - Needs a server to decide the order of operations.
Option 2: CRDTs (Conflict-free Replicated Data Types) - Used by Figma - Idea: Give every character a unique ID. Instead of saying "insert at position 5", say "insert after character with ID abc123". - No server needed - documents merge automatically. - Better for offline support.
The Conflict Problem
Scale and Access Patterns
Before designing, let me figure out how big this system needs to be. This helps us choose the right tools.
| What we are measuring | Number | What this means for our design |
|---|---|---|
| Total documents | 1 billion | Need efficient storage - cannot load all documents in memory |
| Documents edited today | 10 million | These are the hot documents that need fast access |
What to tell the interviewer
Most documents have only 2-5 people editing together. A single server can easily handle this. The challenge is popular documents - like a company announcement where 500 people edit together. For those, we need to broadcast changes efficiently.
How people use the app (from most common to least common):
- 1.Type something - Every keystroke sends an operation to the server (or stores it locally if offline).
- 2.See others' changes - Other people's keystrokes appear on your screen, usually within half a second.
- 3.See cursors - You see colored cursors showing where others are typing. This updates 10-30 times per second.
- 4.Open a document - Load the document content. This happens once when you open the doc.
- 5.Look at version history - See who changed what and when. Rare - maybe once a week.
How much space does one operation need?
- Type of operation (insert, delete): 10 bytes
- Position in document: 8 bytesHigh-Level Architecture
Now let me draw the big picture of how all the pieces fit together. I will keep it simple and explain what each part does.
What to tell the interviewer
I will use the OT approach with a central server, which is simpler and well-tested. Each document is owned by one server. When you type, your keystroke goes to that server, which decides the order and sends updates to everyone else editing the same document.
Collaborative Editing System - The Big Picture
What each part does and WHY it exists:
| Part | What it does | Why we need it (what to tell interviewer) |
|---|---|---|
| Client (Browser) | Keeps a local copy of the document. When you type, it shows instantly on YOUR screen, then sends to server. | Why local copy? So typing feels instant. You do not wait for server. Your local copy might be slightly different from server for a moment, but that is okay. |
| WebSocket Server | Keeps a long-lived connection open between your browser and our servers. Sends and receives messages instantly. | Why WebSocket? Normal HTTP requests are too slow (open connection, send, close). WebSocket stays open so we can push updates to you immediately. |
Common interview question: Why one server per document?
Interviewer might ask: Is one server per document a single point of failure? Answer: Yes, but we have hot standby servers ready to take over. Also, one document failing does not affect other documents. For CRDTs, we do not need this - any server can handle any document because CRDTs merge automatically.
Technology Choices - Why we picked these tools:
Database for Operations: PostgreSQL or Kafka - PostgreSQL: Good for smaller scale, easy to query operations by document - Kafka: Better for massive scale, handles millions of writes per second - Pick based on your scale and what your team knows
Fast Storage for Cursors: Redis - Cursor positions change 30 times per second - We do not need to save cursors forever (if Redis restarts, cursors just refresh) - Redis handles millions of updates per second
Document Snapshots: S3 or PostgreSQL - S3: Cheap storage for large documents with images - PostgreSQL: Simpler if documents are small text-only
Real-time Connection: WebSocket - Standard for real-time apps - Works in all browsers - Alternative: Server-Sent Events (one-way only) or WebRTC (peer-to-peer)
Data Model and Storage
Now let me show how we organize the data in the database. The key idea is: we do not just save the document. We save every single change (operation) that ever happened to the document.
What to tell the interviewer
The operations log is our source of truth - not the document itself. We can rebuild any document by replaying operations from a snapshot. This is like having a recording of every keystroke. It gives us version history for free.
Table 1: Documents - Basic info about each document
This stores the document title, who owns it, and pointers to the latest version.
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID for this document | doc_abc123 |
| title | Document name | Meeting Notes |
Table 2: Operations - Every single keystroke ever made
This is the heart of the system. Every time someone types a letter, deletes something, or makes text bold - we save it here.
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID for this operation | op_789 |
| document_id | Which document this belongs to | doc_abc123 |
Why sequence_num matters
The server assigns sequence numbers to decide the official order. If Alice sends operation 1 and Bob sends operation 2 at the same time, the server decides: Alice is 15234, Bob is 15235. Everyone must apply them in this order.
Table 3: Snapshots - Full document saved periodically
Every 1000 operations, we save the complete document. This makes loading fast.
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID | snap_111 |
| document_id | Which document | doc_abc123 |
| version | Operations included up to this number | 15000 |
| content | The full document text | All the actual text... |
| created_at | When snapshot was taken | 2024-03-20 |
What different operations look like:
INSERT operation (someone typed "Hi")
{
type: "insert",How we load a document (step by step):
FUNCTION load_document(document_id):
STEP 1: Get the latest snapshotWhen to create new snapshots
Create a snapshot after every 1000 operations, or every 5 minutes of activity - whichever comes first. This balances storage cost (more snapshots = more storage) versus loading speed (fewer snapshots = more operations to replay).
How Conflict Resolution Works
Let me explain step by step how we handle the case when two people edit the same spot. This is the core algorithm that makes collaborative editing work.
What to tell the interviewer
I will explain Operational Transformation (OT) first because it is easier to understand. Then I will briefly cover CRDTs as an alternative that works better for offline editing.
Operational Transformation (OT) - The Basic Idea
When you receive someone else's edit, you need to adjust it based on what happened before.
Simple example: - Document: "Hello" (5 characters) - Alice inserts " World" at position 5 - Bob inserts "!" at position 5
Bob's edit was created when the document said "Hello". But by the time the server processes it, Alice already added " World" (6 characters). So Bob's position 5 needs to shift to position 11.
This "shifting" is called transformation.
The transformation rules (simple version):
When INSERT meets INSERT: - If op1 position <= op2 position: no change needed for op1 - If op1 position > op2 position: shift op1 by length of op2
When INSERT meets DELETE: - If insert is before delete: no change - If insert is after delete region: shift back by delete length - If insert is inside deleted region: insert at delete start
When DELETE meets INSERT: - If delete is entirely before insert: no change - If delete is entirely after insert: shift delete position - If delete spans the insert: split the delete
FUNCTION process_operation(incoming_op, document_id):
// Server receives an operation from a client
Example walkthrough:
Document: "Hello" at version 100
- 1.Alice types " World" at position 5 (sent to server) 2. Bob types "!" at position 5 (sent to server)
Server receives Alice's op first: - No transformations needed (nothing concurrent) - Assigned sequence 101 - Document is now "Hello World"
Server receives Bob's op: - Bob was at version 100, but server is now at 101 - Transform Bob's op against Alice's op: - Bob wanted to insert at 5 - Alice inserted 6 characters at position 5 - New position for Bob: 5 + 6 = 11 - Assigned sequence 102 - Document is now "Hello World!"
Server sends to both: - Alice gets: "Bob inserted ! at position 11" - Bob gets: "Alice inserted World at position 5"
Both apply and see: "Hello World!"
OT Transformation Example
CRDT Alternative - Works Without Central Server
CRDTs use a different approach: instead of positions, every character gets a unique ID.
OT approach (position-based):
"Hello" = characters at positions 0, 1, 2, 3, 4
Insert at position 5 means "after position 4"When two people insert after the same character:
Alice inserts "X" after a5, gets ID "alice-1" Bob inserts "Y" after a5, gets ID "bob-1"
Both inserted after "a5". How do we order them?
Simple rule: sort by ID alphabetically. "alice-1" < "bob-1" alphabetically, so X comes before Y.
Result: "HelloXY" - deterministic, works without server coordination.
CRDT vs OT - when to use which
OT: Simpler to understand, needs a central server, used by Google Docs. Better when you have reliable internet. CRDT: More complex, works without server, used by Figma. Better for offline support and peer-to-peer apps.
Handling Cursors and Presence
Besides syncing document changes, we also need to show where other people are typing. This is called presence - knowing who is in the document and where their cursor is.
What to tell the interviewer
Cursor positions update very frequently - up to 30 times per second when someone is moving around. This is different from document operations which are less frequent. We use Redis for cursor data because it is temporary and needs to be super fast.
What presence data looks like:
For each person editing a document, we track: - User ID and name (so we can show "Alice is here") - Cursor position (which character they are at) - Selection (if they highlighted some text) - Color (each person gets a different color) - Last activity time (to detect if they left)
Key: presence:doc_abc123
Value (hash map):How cursor updates work:
ON CLIENT SIDE:
// When cursor moves (user clicks or uses arrow keys)
send_cursor_update({Handling cursor positions when document changes:
When someone inserts or deletes text, everyone's cursor positions might need to shift.
Example: - Alice's cursor is at position 50 - Bob inserts 10 characters at position 30 - Alice's cursor should now be at position 60 (shifted by 10)
We apply the same transformation logic to cursor positions as we do to operations.
Scaling challenge for popular documents
If 100 people are in a document and each sends 30 cursor updates per second, that is 3000 messages per second just for cursors! Solution: batch cursor updates and send every 100ms instead of instantly. Small delay but big reduction in messages.
What Can Go Wrong and How We Handle It
Tell the interviewer about failures
Good engineers think about what can break. Let me walk through the things that can go wrong and how we protect against them.
| What breaks | What happens to users | How we fix it | Why this works |
|---|---|---|---|
| Document server crashes | Users cannot save their typing | Hot standby server takes over + client saves locally | User keeps typing offline, syncs when server is back |
| Internet goes out | User disconnected | Client keeps working offline, queues changes | When internet returns, queued changes sync up |
How the client handles being offline:
The client should work even without internet. Here is how:
CLIENT keeps three lists:
- applied: operations already confirmed by server
- pending: operations sent to server, waiting for confirmationServer failover - when a document server crashes:
SETUP: Each document server has a hot standby
- Primary handles all traffic
- Standby receives copy of all operations (replication)The golden rule: User can ALWAYS type
No matter what breaks - server down, internet out, other users disconnected - the user can always type in their local document. We figure out the syncing later. This is called optimistic UI - assume it will work, handle failures in the background.
Growing the System Over Time
What to tell the interviewer
This design works for most companies. Let me explain how we would grow it if we needed to support millions of documents and users around the world.
How we grow step by step:
Stage 1: Starting out (up to 100,000 documents) - Single document server (with standby) - One PostgreSQL database - One Redis for cursors - All in one data center - Simple and works great for startups
Stage 2: Growing (up to 10 million documents) - Multiple document servers, split by document ID - Document ID mod 10 = which server handles it - Read replicas for the database - Still one region, but multiple servers
Stage 3: Global scale (100 million+ documents) - Document servers in multiple regions (US, Europe, Asia) - Users connect to nearest region - For OT: one region is "primary" for each document - For CRDT: any region can handle any document, they sync
Multi-region Architecture
Handling super popular documents (like company-wide announcements):
When 1000 people edit the same document, broadcasting every keystroke to 999 people is expensive. Solutions:
- 1.Batch updates: Instead of sending each keystroke, collect changes for 100ms and send as a batch.
- 2.Hierarchical broadcast: Server sends to 10 "relay" servers, each relay sends to 100 clients. Tree structure instead of star.
- 3.Limit concurrent editors: Only first 100 people can edit. Others can view (read-only) with "request edit" button.
Cool features we can add later:
1. Comments and suggestions - Comments are easier than real-time editing (they do not conflict) - Attach comments to a character range - Handle what happens when that range is deleted
2. Undo/Redo - With operation log, we can undo any operation - But what if Alice undoes something Bob built on? - Need to transform undo operations too!
3. Import/Export - Import from Word, Google Docs, Markdown - Export to PDF, Word, HTML - These are batch operations, different from real-time
4. Mobile app - Same collaboration logic - But handle slow/flaky mobile internet - Aggressive local caching and offline support
Different types of collaborative apps
Text documents (Google Docs): Character-level operations, OT or CRDT works. Spreadsheets (Google Sheets): Cell-level operations, each cell is independent. Design tools (Figma): Object-level operations, more complex transforms for shapes and layers. Code editors (VS Code Live Share): Need to understand syntax, language-aware conflict resolution.