Open Sourcing Dr. Elephant: Self-Serve Performance Tuning for Hadoop and Spark | LinkedIn Engineering
We are proud to announce today that we are open sourcing Dr. Elephant, a powerful tool that helps users of Hadoop and Spark understand, analyze, and improve the performance of their flows. We first presented Dr. Elephant to the community last year during the eighth annual Hadoop Summit, a leading conference for the Apache Hadoop community.
Hadoop is a framework that facilitates the distributed storage and processing of large distributed datasets involving a number of components interacting with each other. Because of its large and complex framework, it is important to make sure every component performs optimally. While we can always optimize the underlying hardware resources, network infrastructure, OS, and other components of the stack, only users have control over optimizing the jobs that run on the cluster.
To help users understand and optimize their flows, we scheduled regular training sessions on how to tune the jobs, but this didn’t really solve our problem. At LinkedIn, we have employees with different levels of experience with Hadoop using different frameworks to run their Hadoop jobs. Additionally, the number of Hadoop users keeps increasing. This means that having regular sessions for different users on different frameworks is not an easy task, and it’s surely not scalable.
Up until a few years ago, the Hadoop team at LinkedIn analyzed flows on behalf of employees, gave advice on how to tune them, and approved them to run on production. As a first step to optimization, we looked at obvious optimization patterns based on some simple rules and gave advice to the users. But as the users grew, it was difficult to provide sufficient support resources due to delays in user intervention. There was no way to verify if we achieved optimal performance for the job or guarantee performance coverage. We therefore needed to standardize and automate the process.
The Hadoop experts reviewing the flows observed several common recurring optimization patterns, and based on this, we decided to embark on a new experimental project to optimize both Hadoop developer and Hadoop user time. This led to the birth of Dr. Elephant.