System Design Masterclass
collaborationcrdtoperational-transformreal-timewebsocketsadvanced

Design Collaborative Editing System

Design real-time collaborative editing like Notion or Google Docs

Millions of documents, 100+ editors per document|Similar to Google Docs, Notion, Figma, Dropbox Paper, Microsoft Office Online, Coda|45 min read

Summary

Collaborative editing lets many people type in the same document at the same time. Think of Google Docs - you and your friend can both write in the same document, and you see each other's changes instantly. The hard part is: what happens when two people type at the exact same spot? Someone types "cat" and someone else types "dog" at position 5 - we cannot just pick one and lose the other person's work. This is called conflict resolution. There are two ways to solve this: OT (Operational Transformation) used by Google Docs, and CRDTs (a special data structure that merges automatically) used by Figma. Companies like Google, Notion, and Figma ask this question in interviews.

Key Takeaways

Core Problem

The main job is to let many people type in the same document at the same time. Everyone should see each other's changes right away, and nobody's work should get lost.

The Hard Part

When two people type at the exact same spot at the exact same time, we cannot just pick one and throw away the other. We need a smart way to keep both changes.

Scaling Axis

Each document can be handled by one server. Most documents have 1-3 people editing. The hard part is popular documents with 100+ people editing at once.

Critical Invariant

Everyone must end up with the exact same document. If Alice sees Hello World and Bob sees Hello Earth - that is a critical bug. All users must converge to the same content.

Performance Requirement

When you type, you should see your letter instantly (under 16 milliseconds). When someone else types, you should see it within half a second.

Key Tradeoff

OT is simpler and needs a central server to decide order. CRDTs are more complex but work without a server - great for offline editing.

Design Walkthrough

Problem Statement

The Question: Design a document editing app like Google Docs or Notion where many people can type in the same document at the same time.

What the app needs to do (most important first):

  1. 1.Real-time editing - When Alice types something, Bob should see it appear on his screen right away (within half a second).
  2. 2.Handle conflicts - If Alice and Bob both type at the same spot at the same time, keep BOTH of their changes. Do not throw away anyone's work.
  3. 3.Show who is editing - Show colored cursors so you can see where other people are typing.
  4. 4.Work offline - If your internet goes out, you can keep typing. When internet comes back, your changes sync up.
  5. 5.Version history - Save every change so users can go back in time and see what the document looked like yesterday.
  6. 6.Support formatting - Let users make text bold, italic, add headers, bullet points, etc.

What to say first

Before I start designing, I need to understand the conflict resolution requirements. The key question is: when two users type at the same position at the same time, what should happen? This single question determines our entire architecture.

What the interviewer really wants to see: - Do you understand why simple approaches (like just saving the last person's changes) do not work? - Can you explain how OT or CRDTs solve the conflict problem? - Do you know the difference between OT (needs a server) and CRDT (works without server)? - How do you handle someone who was offline for an hour and then reconnects?

Clarifying Questions

Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.

Question 1: How many people edit at once?

How many people will be editing the same document at the same time? Is it usually 2-3 friends, or could it be 1000 people in a company meeting?

Why ask this: If only 2-3 people edit together, the design is simpler. If 1000 people edit the same document, we need special handling.

What interviewers usually say: Most documents have 2-5 editors. But some popular documents (like company announcements) might have 100+ people editing.

How this changes your design: We design for the common case (2-5 editors) but need to handle the rare case (100+ editors) without crashing.

Question 2: Plain text or fancy formatting?

Is this plain text like Notepad, or rich text with bold, italic, headers, and bullet points like Google Docs?

Why ask this: Plain text is much simpler - you just track character positions. Rich text needs to track formatting ranges (which characters are bold).

What interviewers usually say: Rich text with formatting, like Google Docs.

How this changes your design: We need to handle formatting conflicts too. What if Alice makes text bold while Bob deletes it?

Question 3: Do we need offline support?

Should people be able to edit when they have no internet? If yes, what happens when they reconnect?

Why ask this: Offline support is hard! The user might edit for an hour offline while others are editing online. When they reconnect, we need to merge everything.

What interviewers usually say: Yes, mobile users need offline support.

How this changes your design: Offline support pushes us toward CRDTs (which merge automatically) instead of OT (which needs a server to decide).

Question 4: Do we need version history?

Should users be able to see who changed what and when? Can they go back to an old version?

Why ask this: If we need full history, we must save every single keystroke forever. That is a lot of storage.

What interviewers usually say: Yes, like Google Docs version history.

How this changes your design: We save operations (not just final documents), so we can replay history.

Summarize your assumptions

Let me summarize what I will design for: Rich text documents, usually 2-5 concurrent editors but up to 100 on popular docs, offline support required, full version history needed. Given the offline requirement, I will lean toward a CRDT-based design.

The Hard Part

Say this to the interviewer

The hardest part of collaborative editing is handling conflicts. Imagine two people type at the exact same position at the exact same moment. We cannot just pick one and lose the other person's work. We need a smart algorithm to keep both changes.

Let me show you the problem with a simple example:

Document says: "Hello"

Alice and Bob are both editing at the same time: - Alice adds " World" at position 5 -> She sees "Hello World" - Bob adds " Earth" at position 5 -> He sees "Hello Earth"

Both edits happen at the exact same time. Now what?

Bad outcomes we must avoid: - Pick Alice, lose Bob's work -> Bob typed " Earth" and it disappeared! He will be angry. - Pick Bob, lose Alice's work -> Same problem for Alice. - Jumble them together -> "Hello WEoarrtlhd" - nonsense!

Good outcome we want: - Keep both: "Hello World Earth" or "Hello Earth World" - both people's work is saved.

Common mistake candidates make

Many people say: just use timestamps - whoever typed last wins! This is wrong because: (1) it throws away the other person's work, (2) clocks on different computers are not perfectly synced, (3) users will rage-quit when their typing disappears.

Why simple approaches do not work:

1. Locking (only one person can edit at a time) - Problem: That is not real-time collaboration! Users see "Someone else is editing, please wait." - frustrating.

2. Last-write-wins (whoever saved last, their version is kept) - Problem: Your friend's edits silently disappear. They type a whole paragraph, poof - gone!

3. Show merge conflicts (like Git) - Problem: Normal users do not understand merge conflicts. "What do you mean HEAD vs origin/main??"

Two solutions that actually work:

Option 1: Operational Transformation (OT) - Used by Google Docs - Idea: When you get someone else's edit, adjust the position based on what happened before it. - Example: Bob inserted " Earth" at position 5. But wait, Alice already inserted 6 characters. So Bob's insert should now go at position 11 instead. - Needs a server to decide the order of operations.

Option 2: CRDTs (Conflict-free Replicated Data Types) - Used by Figma - Idea: Give every character a unique ID. Instead of saying "insert at position 5", say "insert after character with ID abc123". - No server needed - documents merge automatically. - Better for offline support.

The Conflict Problem

Scale and Access Patterns

Before designing, let me figure out how big this system needs to be. This helps us choose the right tools.

What we are measuringNumberWhat this means for our design
Total documents1 billionNeed efficient storage - cannot load all documents in memory
Documents edited today10 millionThese are the hot documents that need fast access
+ 6 more rows...

What to tell the interviewer

Most documents have only 2-5 people editing together. A single server can easily handle this. The challenge is popular documents - like a company announcement where 500 people edit together. For those, we need to broadcast changes efficiently.

How people use the app (from most common to least common):

  1. 1.Type something - Every keystroke sends an operation to the server (or stores it locally if offline).
  2. 2.See others' changes - Other people's keystrokes appear on your screen, usually within half a second.
  3. 3.See cursors - You see colored cursors showing where others are typing. This updates 10-30 times per second.
  4. 4.Open a document - Load the document content. This happens once when you open the doc.
  5. 5.Look at version history - See who changed what and when. Rare - maybe once a week.
How much space does one operation need?
- Type of operation (insert, delete): 10 bytes
- Position in document: 8 bytes
+ 20 more lines...

High-Level Architecture

Now let me draw the big picture of how all the pieces fit together. I will keep it simple and explain what each part does.

What to tell the interviewer

I will use the OT approach with a central server, which is simpler and well-tested. Each document is owned by one server. When you type, your keystroke goes to that server, which decides the order and sends updates to everyone else editing the same document.

Collaborative Editing System - The Big Picture

What each part does and WHY it exists:

PartWhat it doesWhy we need it (what to tell interviewer)
Client (Browser)Keeps a local copy of the document. When you type, it shows instantly on YOUR screen, then sends to server.Why local copy? So typing feels instant. You do not wait for server. Your local copy might be slightly different from server for a moment, but that is okay.
WebSocket ServerKeeps a long-lived connection open between your browser and our servers. Sends and receives messages instantly.Why WebSocket? Normal HTTP requests are too slow (open connection, send, close). WebSocket stays open so we can push updates to you immediately.
+ 4 more rows...

Common interview question: Why one server per document?

Interviewer might ask: Is one server per document a single point of failure? Answer: Yes, but we have hot standby servers ready to take over. Also, one document failing does not affect other documents. For CRDTs, we do not need this - any server can handle any document because CRDTs merge automatically.

Technology Choices - Why we picked these tools:

Database for Operations: PostgreSQL or Kafka - PostgreSQL: Good for smaller scale, easy to query operations by document - Kafka: Better for massive scale, handles millions of writes per second - Pick based on your scale and what your team knows

Fast Storage for Cursors: Redis - Cursor positions change 30 times per second - We do not need to save cursors forever (if Redis restarts, cursors just refresh) - Redis handles millions of updates per second

Document Snapshots: S3 or PostgreSQL - S3: Cheap storage for large documents with images - PostgreSQL: Simpler if documents are small text-only

Real-time Connection: WebSocket - Standard for real-time apps - Works in all browsers - Alternative: Server-Sent Events (one-way only) or WebRTC (peer-to-peer)

Data Model and Storage

Now let me show how we organize the data in the database. The key idea is: we do not just save the document. We save every single change (operation) that ever happened to the document.

What to tell the interviewer

The operations log is our source of truth - not the document itself. We can rebuild any document by replaying operations from a snapshot. This is like having a recording of every keystroke. It gives us version history for free.

Table 1: Documents - Basic info about each document

This stores the document title, who owns it, and pointers to the latest version.

ColumnWhat it storesExample
idUnique ID for this documentdoc_abc123
titleDocument nameMeeting Notes
+ 5 more rows...

Table 2: Operations - Every single keystroke ever made

This is the heart of the system. Every time someone types a letter, deletes something, or makes text bold - we save it here.

ColumnWhat it storesExample
idUnique ID for this operationop_789
document_idWhich document this belongs todoc_abc123
+ 8 more rows...

Why sequence_num matters

The server assigns sequence numbers to decide the official order. If Alice sends operation 1 and Bob sends operation 2 at the same time, the server decides: Alice is 15234, Bob is 15235. Everyone must apply them in this order.

Table 3: Snapshots - Full document saved periodically

Every 1000 operations, we save the complete document. This makes loading fast.

ColumnWhat it storesExample
idUnique IDsnap_111
document_idWhich documentdoc_abc123
versionOperations included up to this number15000
contentThe full document textAll the actual text...
created_atWhen snapshot was taken2024-03-20

What different operations look like:

INSERT operation (someone typed "Hi")
{
  type: "insert",
+ 21 more lines...

How we load a document (step by step):

FUNCTION load_document(document_id):
    
    STEP 1: Get the latest snapshot
+ 26 more lines...

When to create new snapshots

Create a snapshot after every 1000 operations, or every 5 minutes of activity - whichever comes first. This balances storage cost (more snapshots = more storage) versus loading speed (fewer snapshots = more operations to replay).

How Conflict Resolution Works

Let me explain step by step how we handle the case when two people edit the same spot. This is the core algorithm that makes collaborative editing work.

What to tell the interviewer

I will explain Operational Transformation (OT) first because it is easier to understand. Then I will briefly cover CRDTs as an alternative that works better for offline editing.

Operational Transformation (OT) - The Basic Idea

When you receive someone else's edit, you need to adjust it based on what happened before.

Simple example: - Document: "Hello" (5 characters) - Alice inserts " World" at position 5 - Bob inserts "!" at position 5

Bob's edit was created when the document said "Hello". But by the time the server processes it, Alice already added " World" (6 characters). So Bob's position 5 needs to shift to position 11.

This "shifting" is called transformation.

The transformation rules (simple version):

When INSERT meets INSERT: - If op1 position <= op2 position: no change needed for op1 - If op1 position > op2 position: shift op1 by length of op2

When INSERT meets DELETE: - If insert is before delete: no change - If insert is after delete region: shift back by delete length - If insert is inside deleted region: insert at delete start

When DELETE meets INSERT: - If delete is entirely before insert: no change - If delete is entirely after insert: shift delete position - If delete spans the insert: split the delete

FUNCTION process_operation(incoming_op, document_id):
    // Server receives an operation from a client
    
+ 31 more lines...

Example walkthrough:

Document: "Hello" at version 100

  1. 1.Alice types " World" at position 5 (sent to server) 2. Bob types "!" at position 5 (sent to server)

Server receives Alice's op first: - No transformations needed (nothing concurrent) - Assigned sequence 101 - Document is now "Hello World"

Server receives Bob's op: - Bob was at version 100, but server is now at 101 - Transform Bob's op against Alice's op: - Bob wanted to insert at 5 - Alice inserted 6 characters at position 5 - New position for Bob: 5 + 6 = 11 - Assigned sequence 102 - Document is now "Hello World!"

Server sends to both: - Alice gets: "Bob inserted ! at position 11" - Bob gets: "Alice inserted World at position 5"

Both apply and see: "Hello World!"

OT Transformation Example

CRDT Alternative - Works Without Central Server

CRDTs use a different approach: instead of positions, every character gets a unique ID.

OT approach (position-based):
    "Hello" = characters at positions 0, 1, 2, 3, 4
    Insert at position 5 means "after position 4"
+ 16 more lines...

When two people insert after the same character:

Alice inserts "X" after a5, gets ID "alice-1" Bob inserts "Y" after a5, gets ID "bob-1"

Both inserted after "a5". How do we order them?

Simple rule: sort by ID alphabetically. "alice-1" < "bob-1" alphabetically, so X comes before Y.

Result: "HelloXY" - deterministic, works without server coordination.

CRDT vs OT - when to use which

OT: Simpler to understand, needs a central server, used by Google Docs. Better when you have reliable internet. CRDT: More complex, works without server, used by Figma. Better for offline support and peer-to-peer apps.

Handling Cursors and Presence

Besides syncing document changes, we also need to show where other people are typing. This is called presence - knowing who is in the document and where their cursor is.

What to tell the interviewer

Cursor positions update very frequently - up to 30 times per second when someone is moving around. This is different from document operations which are less frequent. We use Redis for cursor data because it is temporary and needs to be super fast.

What presence data looks like:

For each person editing a document, we track: - User ID and name (so we can show "Alice is here") - Cursor position (which character they are at) - Selection (if they highlighted some text) - Color (each person gets a different color) - Last activity time (to detect if they left)

Key: presence:doc_abc123

Value (hash map):
+ 16 more lines...

How cursor updates work:

ON CLIENT SIDE:
    // When cursor moves (user clicks or uses arrow keys)
    send_cursor_update({
+ 26 more lines...

Handling cursor positions when document changes:

When someone inserts or deletes text, everyone's cursor positions might need to shift.

Example: - Alice's cursor is at position 50 - Bob inserts 10 characters at position 30 - Alice's cursor should now be at position 60 (shifted by 10)

We apply the same transformation logic to cursor positions as we do to operations.

Scaling challenge for popular documents

If 100 people are in a document and each sends 30 cursor updates per second, that is 3000 messages per second just for cursors! Solution: batch cursor updates and send every 100ms instead of instantly. Small delay but big reduction in messages.

What Can Go Wrong and How We Handle It

Tell the interviewer about failures

Good engineers think about what can break. Let me walk through the things that can go wrong and how we protect against them.

What breaksWhat happens to usersHow we fix itWhy this works
Document server crashesUsers cannot save their typingHot standby server takes over + client saves locallyUser keeps typing offline, syncs when server is back
Internet goes outUser disconnectedClient keeps working offline, queues changesWhen internet returns, queued changes sync up
+ 4 more rows...

How the client handles being offline:

The client should work even without internet. Here is how:

CLIENT keeps three lists:
    - applied: operations already confirmed by server
    - pending: operations sent to server, waiting for confirmation
+ 19 more lines...

Server failover - when a document server crashes:

SETUP: Each document server has a hot standby
    - Primary handles all traffic
    - Standby receives copy of all operations (replication)
+ 13 more lines...

The golden rule: User can ALWAYS type

No matter what breaks - server down, internet out, other users disconnected - the user can always type in their local document. We figure out the syncing later. This is called optimistic UI - assume it will work, handle failures in the background.

Growing the System Over Time

What to tell the interviewer

This design works for most companies. Let me explain how we would grow it if we needed to support millions of documents and users around the world.

How we grow step by step:

Stage 1: Starting out (up to 100,000 documents) - Single document server (with standby) - One PostgreSQL database - One Redis for cursors - All in one data center - Simple and works great for startups

Stage 2: Growing (up to 10 million documents) - Multiple document servers, split by document ID - Document ID mod 10 = which server handles it - Read replicas for the database - Still one region, but multiple servers

Stage 3: Global scale (100 million+ documents) - Document servers in multiple regions (US, Europe, Asia) - Users connect to nearest region - For OT: one region is "primary" for each document - For CRDT: any region can handle any document, they sync

Multi-region Architecture

Handling super popular documents (like company-wide announcements):

When 1000 people edit the same document, broadcasting every keystroke to 999 people is expensive. Solutions:

  1. 1.Batch updates: Instead of sending each keystroke, collect changes for 100ms and send as a batch.
  2. 2.Hierarchical broadcast: Server sends to 10 "relay" servers, each relay sends to 100 clients. Tree structure instead of star.
  3. 3.Limit concurrent editors: Only first 100 people can edit. Others can view (read-only) with "request edit" button.

Cool features we can add later:

1. Comments and suggestions - Comments are easier than real-time editing (they do not conflict) - Attach comments to a character range - Handle what happens when that range is deleted

2. Undo/Redo - With operation log, we can undo any operation - But what if Alice undoes something Bob built on? - Need to transform undo operations too!

3. Import/Export - Import from Word, Google Docs, Markdown - Export to PDF, Word, HTML - These are batch operations, different from real-time

4. Mobile app - Same collaboration logic - But handle slow/flaky mobile internet - Aggressive local caching and offline support

Different types of collaborative apps

Text documents (Google Docs): Character-level operations, OT or CRDT works. Spreadsheets (Google Sheets): Cell-level operations, each cell is independent. Design tools (Figma): Object-level operations, more complex transforms for shapes and layers. Code editors (VS Code Live Share): Need to understand syntax, language-aware conflict resolution.

Design Trade-offs

Advantages

  • +Used by Google Docs - proven at massive scale
  • +Simpler to understand than CRDTs
  • +Lower storage (no unique IDs per character)

Disadvantages

  • -Needs a central server to decide order
  • -Complex transform functions for every operation type
  • -Harder to support offline editing
When to use

Use when you have reliable internet and a central server. Good for web apps where users are usually online.