Lei Zhao
Reliable messaging system
Server push
Short polling and long polling
Short polling
❏ message delay up to the polling interval
❏ http request/response overhead when no messages are received
❏ drains battery
Long polling
❏ instant message send
❏ low overhead
❏ low battery impact
Non-HTTP protocols
❏some users are behind an http only proxy
❏no external dependencies
❏will support web clients
User sharding
❏consistent hashing
❏collaborative health checks
❏membership
implementation
Short polling
❏ connection layer is stateless
❏ requires a shared storage
❏ can serve latest state
Long polling
❏ connection layer is stateful
❏ requires async io
❏ shared storage is optional for sending ephemeral messages
❏ usually serves messages
❏ may introduce service coupling
Long polling protocol
/recv?uid=xxx&sessionid=yyy&seq=z
❏ session
❏ seq and ack
❏ 45 seconds timeout and continue
Never fails, I know when I fail, or I don’t know when I fail.
Message delivery guarantee
❏ at most once
❏ at least once
❏ persistence required
❏ consistent model
❏ exact once
❏ client retries
❏ server retries
❏message dedupe
Message ordering
❏ transport on single connection is ordered
❏client reordering by sender timestamp
❏group chat
FAILURES
What if one node fails?
❏Health checks
❏Bring up spare
machines
❏Flap damping
What if half capacity is gone?
❏Service degrade
❏Cascade failures
(avoid services killing
each other)
❏Limit connections
❏Global kill switch
❏Alerting threshold
What if data center is down?
❏DNS failover
❏HTTP redirect
❏Thundering herd
problem