Design Walkthrough
Problem Statement
The Question: Design a payment system like Stripe that can handle millions of credit card payments every day.
What the system needs to do (most important first):
- 1.Accept payments - When someone clicks Pay Now, charge their credit card and give money to the store.
- 2.Keep card numbers safe - Credit card numbers are secret. If hackers steal them, people lose money. We need super strong security.
- 3.Never charge twice - If something crashes, make sure we do not accidentally charge the same card twice for the same purchase.
- 4.Handle refunds - Sometimes people want their money back. We need to return the money to their card.
- 5.Catch fraud - Stop bad guys from using stolen credit cards.
- 6.Work with different cards - Support Visa, Mastercard, American Express, and others.
What to say first
Let me first understand what we are building. Should I focus on the whole payment flow (charging, refunds, sending money to stores) or just the part where we charge the card? Once I know the features, I will ask about how many payments we need to handle.
What the interviewer really wants to see: - Do you know the payment journey? (Your card gets checked by 4 different companies before saying approved) - Can you make sure the same payment is never processed twice? - Do you understand why card numbers need special protection? (There are laws about this called PCI-DSS) - What happens when something crashes in the middle of a payment?
How a payment travels (4 companies involved)
Clarifying Questions
Before you start designing, ask questions to understand what you are building. Good questions show the interviewer you think before you code.
Question 1: How big is this?
How many payments per day do we need to handle? What about during big sales like Black Friday when everyone shops at once?
Why ask this: Handling 100 payments per second needs a different design than handling 10,000 payments per second.
What interviewers usually say: 10 million payments per day. Normal times: 500 payments per second. Black Friday: 5,000 payments per second.
How this changes your design: We need many computers working together. But each payment is independent, so this is actually easier than it sounds.
Question 2: How fast should it be?
How quickly should the payment finish? Is this for online shopping or in-person at a store?
Why ask this: When you tap your card at a coffee shop, you expect it to beep in 1 second. Online, people will wait 2-3 seconds.
What interviewers usually say: Less than 2 seconds. Visa and Mastercard have a rule that payments must finish in 2 seconds.
How this changes your design: The main payment must be fast. We can do slower things (like updating reports) later in the background.
Question 3: What kinds of payments?
Just one-time payments? Or do we also need subscriptions (like Netflix charging you every month automatically)?
Why ask this: Subscriptions are more complicated. We need to remember cards and charge them again later.
What interviewers usually say: Start with one-time payments. Mention how subscriptions would work, but do not spend too much time on it.
How this changes your design: For subscriptions, we need to save card information (safely!) and have a system that wakes up and charges people on schedule.
Question 4: Which countries and currencies?
Are we handling just US dollars? Or do we need to support euros, yen, and other currencies?
Why ask this: Different countries have different rules and different payment methods.
What interviewers usually say: Start with US dollars and the main cards (Visa, Mastercard). Mention that we could add other currencies later.
How this changes your design: For multiple currencies, we need to convert money and work with banks in different countries.
Summarize your assumptions
Let me summarize what I will design for: 10 million payments per day, payments must finish in 2 seconds, US dollars only, Visa and Mastercard, full payment flow (charge, capture, refund), and we must follow the card security laws (PCI-DSS).
The Hard Part
Say this to the interviewer
The hardest part of payments is making sure we never charge someone twice, even when computers crash. Think about it: what if our system crashes AFTER charging the card but BEFORE saving that we charged it? When it restarts, it does not remember. If it tries again, the person pays twice!
Why this is really tricky (explained simply):
- 1.Four companies are involved - Your payment goes through the store, our system, Visa/Mastercard, and your bank. Any of them can crash or lose internet.
- 2.The double-charge nightmare - We send the charge to Visa, our computer crashes, Visa approves it, but we never saved that. When we restart and try again... double charge!
- 3.The lost payment nightmare - Visa approved the payment, but we crashed before telling the store. The store thinks it failed, the customer is charged.
- 4.Half-finished payments - What if the charge worked but the refund failed? Or we sent the money but did not record it? These mismatches are hard to fix.
Time Our System Visa Result
---- ---------- ---- ------
1 Send charge request --> Received Common mistake candidates make
Many people only design the happy path (when everything works). Interviewers will ask: What if the database fails after charging the card? You MUST have an answer. The answer is: use idempotency keys (unique IDs) and a state machine.
How we solve these problems:
Solution 1: Idempotency Keys (unique IDs) - Every payment request has a unique ID - If we see the same ID twice, we return the saved result instead of charging again - Example: Store sends ID pay_abc123 twice → we only charge once and return the same result both times
Solution 2: State Machine (tracking where we are) - Payments move through steps: CREATED → PROCESSING → APPROVED → DONE - Each step change is saved to the database - If we crash, we can see exactly where we stopped and continue from there
Solution 3: Write Before You Act - Before calling Visa, write down: "I am about to call Visa for $50" - If we crash after Visa says yes, we know to check Visa when we restart - We never lose track of what we were doing
Solution 4: Daily Check (Reconciliation) - Every day, compare our records with the bank records - If anything is different, investigate and fix it - This catches any mistakes that slipped through
How idempotency keys prevent double-charges
Scale and Access Patterns
Before designing, let me figure out how big this system needs to be. This helps us choose the right tools.
| What we are measuring | Number | What this means for our design |
|---|---|---|
| Daily payments | 10 million | About 115 payments per second on average - very manageable |
| Peak times (Black Friday) | 5,000 per second | Need extra computers ready for busy times |
What to tell the interviewer
Each payment is independent - one person paying for shoes does not affect another person paying for pizza. This means we can easily add more computers to handle more payments. The challenge is not handling many payments, it is making sure each payment is correct.
How people use the payment system (from most common to least common):
- 1.Process a payment - Someone clicks Pay Now. This is the #1 thing we do, and it must be fast.
- 2.Check payment status - Did my payment go through? Stores and customers ask this a lot.
- 3.Process a refund - Give money back. Less common but important.
- 4.Look up old payments - Show payment history. Can be a bit slower since it is not urgent.
How much space does one payment need?
- Payment info (amount, time, status): about 500 bytes
- Card token (secret reference to saved card): about 50 bytesHigh-Level Architecture
Now let me draw the big picture of how all the pieces fit together. I will keep it simple and explain what each part does.
What to tell the interviewer
I will split the system into three security zones: (1) the public zone that talks to stores, (2) the main payment zone that processes payments, and (3) a super-secure zone just for card numbers. Keeping card numbers separate makes security easier.
Payment System - The Big Picture
What each part does and WHY it is separate:
| Part | What it does | Why it is separate (what to tell interviewer) |
|---|---|---|
| API Server | Receives payment requests from stores. Checks if the store is real. Checks if the request makes sense. | This is the front door. It handles security and validation so the payment service can focus on processing. |
| Payment Service | The brain. Processes payments, manages the state machine, talks to banks. | This is where the magic happens. Keeping it focused on payments makes it easier to get right. |
Common interview question: Why not just one big service?
Interviewers often ask: Can not you just put everything in one service? Your answer: Yes, for a small system that works fine. We split because: (1) Card numbers need extra security - separating them limits what hackers can steal. (2) Different parts need different speeds - fraud checking is slow, payment checking must be fast. (3) Different parts fail differently - if fraud checking breaks, we can still process payments.
Technology Choices - Why we picked these tools:
Database: PostgreSQL (Recommended) - Why: Great at keeping data safe and correct. Supports transactions (all-or-nothing operations). Most engineers know SQL. - Other options: - MySQL: Also good - pick what your team knows - MongoDB: Not ideal because payment data has lots of relationships
Fast Cache: Redis (Recommended) - Why: Super fast, can store idempotency keys, used by everyone so lots of help available - We use it for: Remembering which payment IDs we have seen (to prevent double-charges)
Message Queue: Kafka or RabbitMQ - Why: Some tasks can wait (like sending receipts). We put them in a queue and do them later. - Kafka: Better for big volume - RabbitMQ: Easier to set up
Encryption: HSM (Hardware Security Module) - Why: Special hardware for encryption. Card numbers are encrypted by a physical device that is impossible to hack remotely. - Required by law (PCI-DSS) for storing card numbers
Important interview tip
Pick technologies YOU know! If you have used MySQL at your job, use MySQL in your design. Interviewers care more about your reasoning than the specific tool. Say something like: I will use PostgreSQL because I have experience with it, but MySQL would work just as well.
Data Model and Storage
Now let me show how we organize the data in the database. Think of tables like spreadsheets - each one stores a different type of information.
What to tell the interviewer
I will use PostgreSQL as the main database because payments need strong guarantees - if we say the payment worked, it really worked. I will use Redis for the idempotency cache because we need to check every single payment very fast.
Table 1: Payments - The main table that stores every payment
When someone pays, we create a row here. We track where the payment is in the process (created, processing, approved, etc.).
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID for this payment | pay_abc123 |
| idempotency_key | The ID the store sent (to prevent double-charges) | order_456_charge |
Database Index
We add an INDEX on (merchant_id, created_at). This makes finding a store's recent payments FAST - like a book index that helps you find pages quickly.
Table 2: Payment Events - History of everything that happened to a payment
Every time a payment changes status, we save a record. This helps us understand what happened if something goes wrong.
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID for this event | evt_111 |
| payment_id | Which payment this belongs to | pay_abc123 |
Table 3: Card Tokens - Saved cards (stored in the super secure zone)
When customers save their card for later, we store it here. The actual card number is encrypted and only the Card Safe can read it.
| Column | What it stores | Example |
|---|---|---|
| id | Unique ID for this saved card | card_tok_789 |
| merchant_id | Which store this card is saved for | store_xyz |
Important: Card numbers are encrypted
The encrypted_pan column contains the card number, but it is scrambled using a special key stored in the HSM (encryption hardware). Even if someone steals our database, they cannot read the card numbers without the HSM - and the HSM is physically secured and cannot be hacked remotely.
How we use the Idempotency Cache (Redis)
Every payment request must have a unique idempotency key. Before processing any payment, we check if we have seen this key before.
FUNCTION check_idempotency_key(key, merchant_id):
// Make a unique lookup keyPayment State Machine
A payment goes through several steps. We track these steps carefully so we always know where a payment is, even if our system crashes.
What to tell the interviewer
I use a state machine to track payments. A state machine is like a flowchart - the payment can only move along certain paths. This prevents impossible situations like a payment being refunded before it was charged.
Payment States - Where can a payment be?
What each status means (in simple terms):
- Created: Payment request received, but we have not talked to the bank yet - Processing: We asked the bank to approve the payment, waiting for answer - Approved: Bank said the card is good and reserved the money. But we have not taken it yet! - Captured: We actually took the money from the customer card - Cancelled: Store decided not to take the money (before capturing) - Failed: Something went wrong - card declined, fraud detected, etc. - Refunded: We gave the money back to the customer - Settled: Money was sent to the store bank account. All done!
Why separate Approved and Captured?
Imagine you book a hotel. They check if your card is good (Approved) but do not charge until you check out (Captured). This is called "authorize and capture". If you cancel, they just release the approval - no money ever moved!
VALID MOVES:
Created --> Processing (we started working on it)
Processing --> Approved (bank said yes)Critical: Why we use FOR UPDATE
The FOR UPDATE in the query is super important. Without it, two requests could both read status = Approved at the same time, and both try to capture the payment. With FOR UPDATE, the database makes them wait in line - only one can change the payment at a time.
Processing a Payment Step by Step
Let me walk through exactly what happens when someone clicks Pay Now, including what we do when things go wrong.
Payment Flow (everything that happens)
FUNCTION process_payment(request):
// request contains: idempotency_key, merchant_id, amount, card_token
Critical: Timeout Handling
When the bank does not respond in time, we MUST NOT assume the payment failed. The charge might have gone through! If we assume failure and let the store retry, we could double-charge. Instead, we mark the payment for manual review and tell the store to try again. The idempotency key makes retries safe.
What Can Go Wrong and How We Handle It
Tell the interviewer about failures
Good engineers think about what can break. Let me walk through the things that can go wrong and how we protect against them.
Common failures and how we handle them:
| What breaks | What happens to users | How we fix it | Why this works |
|---|---|---|---|
| Database goes down | Cannot process payments | Keep backup databases ready + automatic switch | Backup takes over in less than 30 seconds |
| Redis (idempotency cache) goes down | Cannot check for double-charges | Fail closed - reject all payments until fixed | Better to reject than risk double-charging |
Important Decision: Fail Open vs Fail Closed
When something breaks, we have two choices: - Fail Open: Let the payment through anyway and hope for the best - Fail Closed: Reject the payment and tell them to try again
For payments, we almost always Fail Closed because: - A rejected payment can be tried again - A double-charge requires refund, apology, and damages customer trust - Stores prefer "declined" over "fraudulent charge"
// We connect to multiple bank networks
// If one is down, we try the next
What is a Circuit Breaker?
A circuit breaker is like the one in your house. If too much electricity flows, it flips off to prevent a fire. Our circuit breaker does the same for failing services - if a bank fails too many times, we stop trying for a while. This prevents one broken bank from slowing down our whole system.
Making Sure Our Records Match the Bank
System Rules That Must Never Break
1. NEVER charge a customer twice for the same thing\n2. NEVER lose a payment that went through\n3. Every penny must be accounted for (money in = money out)\n4. Card numbers must always be encrypted
Even with all our safety measures, mistakes can happen. What if a cosmic ray flips a bit in our database? What if a bank sends us wrong information? We need a safety net that catches ANY mistake.
The safety net is called Reconciliation - we compare our records with the bank records every day.
// This runs every day at 6 AM, after banks send us their daily report
FUNCTION daily_reconciliation():Why reconciliation matters
Reconciliation is our last line of defense. Even if our system has bugs, even if the bank makes mistakes, reconciliation catches them. Companies that handle money MUST do daily reconciliation - it is required by law for financial audits.
| Rule | How we enforce it | How we verify it |
|---|---|---|
| Never charge twice | Idempotency keys with 24-hour memory | Check every request against cache before processing |
| Never lose payments | Write-ahead logging + state machine | Daily reconciliation catches any missing records |
| Money must balance | Double-entry accounting (credits = debits) | End-of-day totals must sum to zero |
| Cards always encrypted | HSM hardware + separate Card Safe zone | PCI-DSS audit + security testing |
Growing the System Over Time
What to tell the interviewer
This design handles 10 million payments per day easily. Let me explain how it grows if we need to handle 100 times more, or add features like subscriptions and international payments.
How we grow step by step:
Stage 1: Starting out (up to 10 million payments per day) - One PostgreSQL database with backup copies - One Redis cluster for idempotency cache - Primary + backup bank connections - All servers in one location (like US-East) - This handles most businesses easily
Stage 2: Medium scale (up to 100 million payments per day) - Add read-only database copies for reports and dashboards - Split payment processing across multiple server groups - Each group handles certain merchants - Still one main database for writes (strong consistency)
Stage 3: Big scale (up to 1 billion payments per day) - Multiple data centers in different countries - Each region handles local payments - Cross-region payments get routed to the right place - This is how Stripe and PayPal work
Multi-Region Architecture (Stage 3)
Cool features we can add later:
| Feature | What it is | How hard to add |
|---|---|---|
| Subscriptions | Charge customers automatically every month (like Netflix) | Medium - need scheduler and saved cards |
| Multiple currencies | Accept euros, yen, pounds, not just dollars | Medium - need currency conversion rates |
What if we needed super fast payments?
For things like stock trading where milliseconds matter, I would use in-memory processing instead of a database. The payment would be recorded in memory first (super fast), then written to disk later (safer but slower). This trades some safety for speed - good for trading, not good for regular purchases.
Key Takeaways for the Interview:
- 1.Safety over speed: In payments, being correct is more important than being fast. Design to never charge twice.
- 2.Idempotency keys are essential: Every payment must have a unique ID. If we see the same ID twice, return the saved result.
- 3.State machines prevent bugs: Payments move through defined steps (created → processing → approved → captured). This prevents impossible states.
- 4.Reconciliation is mandatory: Compare your records with the bank every day. This catches any mistakes that slipped through.
- 5.Security laws shape the design: PCI-DSS requires separating card numbers into a secure zone. This is not optional - it is the law.