Design Walkthrough
Problem Statement
The Question: Design a digital wallet system that can serve billions of users globally, supporting deposits, withdrawals, P2P transfers, and merchant payments.
A payment wallet must handle: - Balance Management: Track exact balance for each user - Money Movement: Transfer between wallets atomically - External Integration: Connect to banks, cards, payment networks - Compliance: KYC/AML, transaction limits, regulatory reporting - Global Scale: Low latency for users worldwide
What to say first
This is a financial system, so correctness is paramount. Before I design, let me clarify the consistency requirements, scale expectations, and what types of transactions we need to support.
Hidden requirements interviewers are testing: - Do you understand why financial systems need strong consistency? - Can you prevent double-spend in distributed systems? - Do you know how to design an immutable audit trail? - Can you handle the complexity of global regulations?
Clarifying Questions
Financial systems have non-negotiable requirements. Ask these questions to scope the problem.
Question 1: Transaction Types
What transaction types do we need? P2P transfers, merchant payments, deposits, withdrawals, refunds?
Why this matters: Different transaction types have different consistency requirements. Typical answer: All of the above, with P2P being most common Architecture impact: Need flexible transaction engine, not just simple balance updates
Question 2: Consistency Requirements
Is eventual consistency acceptable for any operation, or do all balance-affecting operations need strong consistency?
Why this matters: This is the fundamental tradeoff in distributed payments. Typical answer: Balance changes must be strongly consistent. Read-only queries can be eventually consistent. Architecture impact: Need distributed transactions or careful partitioning
Question 3: Global Requirements
Is this a single-region or multi-region deployment? Do users in Asia transact with users in Europe?
Why this matters: Cross-region transactions are much harder. Typical answer: Multi-region, but most transactions are within-region Architecture impact: Regional deployment with cross-region settlement for inter-region transfers
Question 4: Scale
How many users? What is the peak transactions per second? What is the average transaction size?
Why this matters: Determines partitioning strategy. Typical answer: 1B users, 100K TPS peak, $50 average Architecture impact: Need horizontal scaling, cannot rely on single database
Stating assumptions
I will assume: 1B users globally, 100K TPS peak, P2P transfers are 80% of transactions, strong consistency required for all balance changes, multi-region deployment with 95% of transactions within-region.
The Hard Part
Say this out loud
The hard part here is preventing double-spend while maintaining low latency at global scale. If Alice has $100 and sends $100 to both Bob and Charlie simultaneously, only one transaction can succeed.
Why this is genuinely hard:
- 1.Double-Spend Problem: Two concurrent requests to spend the same money. Without proper locking, both could succeed, creating money out of thin air.
- 2.Distributed Transactions: A P2P transfer touches two wallets. Both balance updates must succeed or both must fail. This is a distributed transaction.
- 3.Global Latency: Users expect sub-second payments. Synchronous cross-region coordination adds 100-300ms latency.
- 4.Auditability: Every cent must be traceable. Regulators require complete transaction history.
Double-Spend Problem
Common mistake
Candidates often forget that read-then-write is not atomic. You must use database transactions with proper isolation levels, or optimistic concurrency control.
The fundamental constraint:
For any wallet W at any time T:
Balance(W, T) = Initial_Deposit + Sum(Credits) - Sum(Debits)
This is the conservation law of money in the system.Every design decision must preserve this invariant.
Scale & Access Patterns
Let me estimate the scale and understand access patterns for a global payment wallet.
| Dimension | Value | Impact |
|---|---|---|
| Total Users | 1 Billion | Need massive horizontal scaling |
| Daily Active Users | 100 Million | 10% DAU is typical for payment apps |
What to say
At 100K TPS with strong consistency requirements, we cannot use a single database. We need to partition by wallet_id and ensure transactions within a partition are serializable.
Access Pattern Analysis:
- Hot wallets exist: Merchants and popular users have 1000x more transactions - Temporal patterns: Payday spikes, holiday shopping, regional festivals - Read vs Write: Balance checks (read) are 5x more frequent than transfers (write) - Locality: 95% of P2P transfers are within same country/region
Storage needed:
- 1B users x 1KB account data = 1TB accounts
- 300M txns/day x 500 bytes = 150GB/day transactionsHigh-Level Architecture
Let me design a globally scalable payment wallet architecture.
What to say
I will use event sourcing with a ledger-based architecture. Every state change is recorded as an immutable event. The wallet balance is derived from the event log, ensuring complete auditability.
Payment Wallet Architecture
Component Responsibilities:
1. Wallet Service - Manages wallet state (balance, status, limits) - Validates transaction requests - Enforces business rules (daily limits, KYC status)
2. Transaction Service - Orchestrates money movement - Ensures atomicity across wallets - Handles idempotency
3. Ledger Service - Immutable record of all money movements - Double-entry bookkeeping - Source of truth for auditing
4. External Gateways - Bank integration (ACH, wire) - Card network integration - Partner payment systems
Real-world reference
PayPal uses a similar architecture with a centralized ledger. Stripe uses event sourcing for their payment processing. Square partitions by merchant_id for their seller ecosystem.
Data Model & Storage
The data model is critical for a payment system. We use double-entry bookkeeping where every transaction has equal debits and credits.
What to say
I will use a double-entry ledger model. Every transaction creates two entries: a debit from one account and a credit to another. The sum of all debits must equal the sum of all credits.
-- Wallet/Account table
CREATE TABLE wallets (
wallet_id UUID PRIMARY KEY,Double-Entry Bookkeeping:
Every money movement creates balanced entries:
Transaction: txn_123
Ledger Entries:Important detail
We store balance_after in each ledger entry. This allows reconstructing balance at any point in time without replaying all transactions - essential for auditing and dispute resolution.
Partitioning Strategy:
- Partition by
wallet_idhash - P2P transfers between wallets in SAME partition: single database transaction - P2P transfers between wallets in DIFFERENT partitions: saga pattern with compensation
def get_partition(wallet_id: str) -> int:
"""Consistent hash to determine partition"""
return hash(wallet_id) % NUM_PARTITIONS
def is_same_partition(wallet_a: str, wallet_b: str) -> bool:
"""Check if two wallets are on same partition"""
return get_partition(wallet_a) == get_partition(wallet_b)
# Optimization: Assign related wallets to same partition
# e.g., family members, frequent transfer pairsTransaction Flow Deep Dive
Let me walk through how a P2P transfer actually works, handling the two cases: same-partition and cross-partition.
Case 1: Same-Partition Transfer (Simple)
Both wallets are on the same database partition. We use a single ACID transaction.
def transfer_same_partition(
source_wallet: str,
dest_wallet: str, Case 2: Cross-Partition Transfer (Complex)
Wallets are on different partitions. We use the Saga pattern with compensation.
Cross-Partition Saga
class TransferSaga:
def __init__(self, txn_id: str):
self.txn_id = txn_idCritical consideration
Saga rollbacks can fail! You need monitoring, alerting, and manual intervention procedures. This is why same-partition transfers are strongly preferred.
Consistency & Invariants
System Invariants
1. Wallet balance must never go negative. 2. Sum of all debits must equal sum of all credits. 3. Every transaction must have an idempotency key. 4. Ledger entries are immutable - never update or delete.
Why strong consistency is non-negotiable:
Unlike social media (where eventual consistency is fine), payment systems have real-world consequences for inconsistency:
| Inconsistency Type | Business Impact | Legal Impact |
|---|---|---|
| Double-spend | Company loses money | Potential fraud liability |
| Lost transaction | User loses money | Regulatory violation |
| Balance mismatch | Accounting errors | Audit failure |
| Missing audit trail | Cannot resolve disputes | Compliance violation |
Business impact mapping
A 0.01% error rate at 100K TPS means 10 incorrect transactions per second. At $50 average, that is $500/second in potential losses or disputes. Strong consistency is a business requirement.
Idempotency:
Every operation must be idempotent. Network failures and retries are common.
def transfer_with_idempotency(request: TransferRequest) -> TransferResponse:
# Client provides idempotency key (usually UUID or hash of request)
idempotency_key = request.idempotency_keyWhat to say
We choose strong consistency for all balance-affecting operations because the cost of inconsistency (financial loss, regulatory issues) far outweighs the performance cost. Read-only operations like balance display can use eventually consistent replicas.
Failure Modes & Resilience
Proactively discuss failures
Payment systems must handle failures gracefully. Unlike other systems where we might fail open, payment systems must fail closed - it is better to reject a transaction than risk incorrect money movement.
| Failure | Impact | Mitigation | Why It Works |
|---|---|---|---|
| Database down | Cannot process transactions | Multi-AZ deployment, automatic failover | Standby promotes in <30 seconds |
| Network partition | Cannot reach other partition | Queue and retry, or reject | User retries, no inconsistency |
Fail Closed for Payments:
Unlike the rate limiter (fail open), payment systems must fail closed:
def process_transfer(request: TransferRequest) -> TransferResponse:
try:
# Attempt transaction with strict timeoutReconciliation:
Despite best efforts, inconsistencies happen. Daily reconciliation catches them:
-- Check 1: Ledger balance vs wallet balance
SELECT
w.wallet_id,Real-world practice
PayPal runs reconciliation every 15 minutes. Banks do end-of-day reconciliation. Any discrepancy triggers immediate investigation and potential system pause.
Evolution & Scaling
What to say
This design works well for single-region deployment up to 50K TPS. Let me discuss how it evolves for truly global scale with users on every continent.
Evolution Path:
Stage 1: Single Region (up to 50K TPS) - Single primary database cluster - All transactions in one region - Simple consistency model
Stage 2: Multi-Region Active-Passive (up to 100K TPS) - Primary region handles all writes - Read replicas in other regions - Cross-region reads for balance checks only
Stage 3: Multi-Region Active-Active (up to 1M TPS) - Each region owns a partition of wallets - Cross-region transfers use async settlement - Complex but necessary for global scale
Global Multi-Region Architecture
Cross-Region Transfer Settlement:
For transfers between regions (e.g., US user to EU user):
1. US user initiates transfer to EU user
2. US region:
- Debits US user immediatelyAlternative approach
If instant global transfers were required (not just eventual), I would use CockroachDB or Spanner for globally distributed transactions. The latency penalty (200-500ms) might be acceptable for the consistency guarantee.
What I would do differently for...
Cryptocurrency wallet: Use blockchain as the ledger. Different consistency model (eventual with finality).
High-frequency trading: Optimize for latency. Single location, in-memory processing, eventual consistency to separate ledger.
Micropayments: Batch transactions, use probabilistic verification, accept some loss for throughput.
Enterprise B2B: Longer settlement windows acceptable, focus on compliance and audit trails.