The difficulty of scaling Online Social Networks (OSNs) has introduced new
system design challenges that has often caused costly re-architecting for
services like Twitter and Facebook. The complexity of interconnection of
users in social networks has introduced new scalability challenges.
Conventional vertical scaling by resorting to full replication can be a
costly proposition. Horizontal scaling by partitioning and distributing data
among multiples servers - e.g. using DHTs - can lead to costly inter-server
communication.
We design, implement, and evaluate SPAR, a social partitioning and
replication middleware that transparently leverages the social graph
structure to achieve data locality while minimizing replication. SPAR
guarantees that for all users in an OSN, their direct neighbor's data is
co-located in the same server. The gains from this approach are multi-fold:
application developers can assume local semantics, i.e., develop as they
would for a single server; scalability is achieved by adding commodity
servers with low memory and network I/O requirements; and redundancy is
achieved at a fraction of the cost.
We detail our system design and an evaluation based on datasets from
Twitter, Orkut, and Facebook, with a working implementation. We show that
SPAR incurs minimum overhead, and can help a well-known open-source Twitter
clone reach Twitter's scale without changing a line of its application logic
and achieves higher throughput than Cassandra, Facebook's DHT based
key-value store database.