Here's Why Big Tech Bets Big on Apache Cassandra - An Interview With Vinay Chella of Netflix

Written by Hackerhodl | Published 2021/04/28
Tech Story Tags: apache | cassandra | netflix | apache-cassandra | apache-netflix | netflix-uses-apache | zero-copy-streaming-cassandra | interview | web-monetization

TLDR Apache Cassandra is high-frequency trading ready in terms of availability, scale, and performance. Netflix uses Apache Cassandra heavily to satisfy its ever-growing persistence needs. Cassandra has no single point of failure as it automatically replicates to multiple nodes across data centers, making it highly available. Netflix is hosting a global party on Wednesday, April 28, to celebrate the upcoming 40 release milestone. The event is a one-day virtual event with three sessions that are one hour long each one hour.via the TL;DR App

Q1 - Tell us something about your background(s).

I am a distributed database engineer by heart and have been in the database space for over a decade, focusing on building highly reliable distributed systems and databases.
I started my journey with Apache Cassandra back in 2012 with version 0.7 with its thrift-based Hector client and I got to spend quite a reasonable amount of time with Nate McCall. 
I have come to Apache Cassandra and distributed databases ecosystem from relational databases world, so it is a pretty different mindset and experience building a different kind of monolith and microservices at webscale.  
I joined Netflix in 2014 to help build the company’s open-source Apache Cassandra ‘muscle’ and provide expertise in microservices with scalable persistence layers. Today, the company uses Apache Cassandra heavily to satisfy its ever-growing persistence needs.

Q2 - Imagine if you accidentally walked into a FinTech meet and had to explain Apache Cassandra to someone there. How would you do that?

I would say Apache Cassandra is high-frequency trading (HFT) ready in terms of availability, scale, and performance. You never have to think about planned downtimes for upgrades, rollouts, and unplanned outages; you just have to plan your consistency needs appropriately.

Q3 - Apache Cassandra is ‘Highly Available, Eventually Consistent database.’ What do you mean by that in terms regular joes can understand?

Well, to put it simply, your database is there when you need it most and answers the questions with the same accuracy every time. Cassandra has no single point of failure as it automatically replicates to multiple nodes across data centers, making it highly available.
The trade-off is some latencies in read to write performance, which means data is eventually consistent. However, you can tweak it to operate as tunable consistent for specific reads and writes. 

Q4 - Why do you think that ‘Zero Copy’ streaming was not an option in the earlier days?

Yes, Zero Copy streaming for faster scaling operations is a great feature. When I heard about streaming in Apache Cassandra 4.0, the first thing that came to my mind was,
“Why did we not do it in the first place?”
This feature certainly makes C* operations much easier and tractible for large-scale deployments. The ability for nodes to stream data between each other in their clusters via SSTables makes it a comfortable fit for Kubernetes and cloud environments.

Q5 - Who’s got the most intricately designed technology stack in the world, in your opinion?

Every company has a unique challenge to solve, and I find the core systems that solve these tough and unique problems always have an interesting way of approaching them. I can’t speak to the approach by other companies, but at Netflix, we question our architectural assumptions and dependencies constantly.
Most recently, we’ve discussed our approach to distributed tracing infrastructure, and you can read about that here.
Questioning our assumptions ensures that we are always on top of our technology stack’s performance, efficiency, and scalability. One of the recent outcomes of such questioning was to raise the level of abstractions for our developers (and you can read about the Bulldozer self-serve data platform on the Netflix TechBlog).
We have decades of experience operating high-performance and scalable datastores, such as Apache Cassandra.
This knowledge has enabled us to build higher-level abstractions on top of data store compositions, which has resulted in increased developer velocity and optimized data store access patterns.

Q6 - At Hacker Noon, we’re flooded with story submissions that claim to have the ability to build Netflix/VoD clones. Don your most cynical hat and tell us what you think?

Sorry, but I’m not a cynical kind of person; I see it as a positive sign that there is a huge interest in the community to solve this problem. It is encouraging for me to see this kind of response.
There are several famous people who’ve said it in slightly different ways, but without competition, where you have a monopoly, you can become complacent and settle for mediocrity and have little incentive to progress.

Q7 - In closing, what would you like to tell the Hacker Noon readers, some of whom are currently using Apache tools?

The Cassandra community is hosting a global party on Wednesday, April 28 to celebrate the upcoming 4.0 release milestone. It’s a one-day virtual event with three sessions that are one hour each so you can attend in your time zone.
You can register for the event here.
If you use or contribute to Cassandra, you can submit a lightning talk here: https://sessionize.com/cassandra.
We look forward to meeting current and new users around the globe.

Written by Hackerhodl | VP of BD and Blockchain Editor @ Hacker Noon
Published by HackerNoon on 2021/04/28