The Distributed SQL Blog

Thoughts on distributed databases, open source, and cloud native

Distributed SQL Summit Recap: Pinterest’s Exploration of Distributed SQL

VP Developer Relations

At the Distributed SQL Summit 2020, Lianghong Xu – Engineering Manager & Tech Lead, Pinterest – presented the talk “Pinterest’s Exploration of Distributed SQL”. In the talk he covered the evolution of storage at Pinterest, the role that the HBase ecosystem plays within the company, current challenges and opportunities for innovation, and finally, their exploration of Distributed SQL as a viable solution to some of these challenges.

The Storage Evolution at Pinterest

In 2012, Pinterest started out with a sharded MySQL deployment which handled pins, boards, and users. The following year they introduced HBase to their stack. HBase was leveraged as a columnar store and provides graph service capabilities. In 2015, RocksDB was brought into the mix as a key-value store and to assist with machine learning styled workloads. As the popularity of Pinterest grew and new use cases began to evolve, new capabilities like distributed transactions and support for secondary indexes meant that some additional middleware had to be developed to work in concert with HBase. Fast forward to 2020, and Pinterest now finds itself at a stage in their evolution to start exploring distributed SQL. Why? Because many use cases are now requiring NoSQL scalability with SQL capabilities.

The Storage Evolution at Pinterest

HBase at Pinterest

HBase has for many years been a foundational building block of Pinterest’s infrastructure. Pinterest has hundreds of individual use cases currently being supported by roughly 50 HBase clusters. The clusters manage 10+ petabytes of data and must satisfy over 100 million queries per second. Because HBase is relied on so heavily, there have been some augmentations that have been made to HBase over the years at Pinterest as the needs of the business changed. For example HBase was combined with Memcache to deliver graph and columnar store capabilities. It was also coupled with Omid to handle distributed transactions and with Muse to provide indexing functionality.

HBase at Pinterest

Data Infrastructure Challenges

In this next part of his talk, Lianghong walked us through the challenges that Pinterest faces by relying so heavily on HBase. These included:

  • Limited Functionality: Its simple interface often lacks the necessary capabilities that new use cases demand.
  • High Complexity: Lost of moving parts; high “Keep the Lights On” (KTLO) and onboarding costs.
  • Data Inconsistency: A lack of transactional support means constantly having to account for “eventual consistency.”
  • Tail Inconsistency: Because HBase is written in Java, things like garbage collection mean that the system needs constant tuning.

Distributed SQL to the Rescue?

At this stage of Pinterest’s journey, Lianghong makes a strong case for considering distributed SQL as a possible solution to many of the compounding problems that HBase presents. Some of the benefits that look promising include:

  • A unified storage backend
  • NoSQL scalability with support for rich SQL functionality
  • A highly available design for “always online” serving and short MTTR
  • The ability to tune consistency model based on the workload
  • No loss in performance for existing applications

Pinterest Database Vision Distributed SQL

Pinterest looked at roughly 15 different distributed SQL systems and YugabyteDB stood out as a viable solution. Lianghong concluded his talk by sharing some details and benchmarks concerning Pinterest’s evaluation of YugabyteDB.

Want to See More?

Check out all the talks from this year’s Distributed SQL Summit including Pinterest, Mastercard, Comcast, Kroger, and more on our Vimeo channel.

Related Posts

VP Developer Relations