The Netflix Tech Blog: Netflix Data Benchmark: Benchmarking Cloud Data Stores
The Netflix member experience is offered to 83+ million global members, and delivered using thousands of microservices. These services are owned by multiple teams, each having their own build and release lifecycles, generating a variety of data that is stored in different types of data store systems. The Cloud Database Engineering (CDE) team manages those data store systems, so we run benchmarks to validate updates to these systems, perform capacity planning, and test our cloud instances with multiple workloads and under different failure scenarios. We were also interested in a tool that could evaluate and compare new data store systems as they appear in the market or in the open source domain, determine their performance characteristics and limitations, and gauge whether they could be used in production for relevant use cases. For these purposes, we wrote Netflix Data Benchmark (NDBench), a pluggable cloud-enabled benchmarking tool that can be used across any data store system. NDBench provides plugin support for the major data store systems that we use — Cassandra (Thrift and CQL), Dynomite (Redis), and Elasticsearch. It can also be extended to other client APIs.
As Netflix runs thousands of microservices, we are not always aware of the traffic that bundled microservices may generate on our backend systems. Understanding the performance implications of new microservices on our backend systems was also a difficult task. We needed a framework that could assist us in determining the behavior of our data store systems under various workloads, maintenance operations and instance types. We wanted to be mindful of provisioning our clusters, scaling them either horizontally (by adding nodes) or vertically (by upgrading the instance types), and operating under different workloads and conditions, such as node failures, network partitions, etc.
As new data store systems appear in the market, they tend to report performance numbers for the “sweet spot”, and are usually based on optimized hardware and benchmark configurations. Being a cloud-native database team, we want to make sure that our systems can provide high availability under multiple failure scenarios, and that we are utilizing our instance resources optimally. There are many other factors that affect the performance of a database deployed in the cloud, such as instance types, workload patterns, and types of deployments (island vs global). NDBench aids in simulating the performance benchmark by mimicking several production use cases.
There were also some additional requirements; for example, as we upgrade our data store systems (such as Cassandra upgrades) we wanted to test the systems prior to deploying them in production. For systems that we develop in-house, such as Dynomite, we wanted to automate the functional test pipelines, understand the performance of Dynomite under various conditions, and under different storage engines. Hence, we wanted a workload generator that could be integrated into our pipelines prior to promoting an AWS AMI to a production-ready AMI.