Hadoop performance troubleshooting with stack tracing, an introduction. | Databases at CERN
This post is about profiling and performance tuning of distributed workloads and in particular Hadoop applications. You will learn of a profiler application we have developed and how it has successfully been applied to tuning Sqoop to improve the throughput of data transfer from Oracle to Hadoop.
Where is my Sqoop job spending CPU time?
One of the data feeds into our Hadoop service is from Oracle databases. For this we make use of Sqoop. In particular we use Sqoop to export Oracle data into Parquet on HDFS for later analysis with Impala and other tools. However, we observed suboptimal performance when running Sqoop jobs: each job was delivering low throughput, of the order of 2 MB/second. In order to increase the total throughput, we have to increase the numbers of mappers. This is not an ideal solution and has several unwanted consequences both on the source and destination systems. One of them is that since we are increasing the number of mappers, we are also increasing the number of utilized cores on the Hadoop cluster, impairing CPU usage for other applications. Initial investigations did not show any problems on the database nor on the HDFS side to light. As a result, we started investigating the behaviour of Sqoop and we decided to implement a distributed profiler in order to get a detailed view what Sqoop is exactly doing.
A profiler tool for Hadoop.
Hprofiler(link is external) (Hadoop Profiler) is a tool we have developed based on the ideas of stack profiling and Flame Graph visualization (as described in the original work of Brendan Gregg). The goal is to make the tool as versatile as possible so that it can investigate generic Hadoop application and answers the question “Where is my distributed application spending CPU time?”.
A major problem in distributed computing is identifying bottlenecks of a user’s application. Especially in a highly distributed environment this is often not easy. Furthermore, profiling computing frameworks like Hadoop and Spark only complicate the matter because these are JVM-based architectures. The reason for this additional complexity is because there is no system table available for the methods executed on the JVM. Also, the JVM uses the stack pointer register in the CPU as a general purpose register, thus, breaking traditional stack walking. In the following we will build on top of the work of Brendan Gregg at described in ‘Java Mixed-Mode Flame Graphs at Netflix, JavaOne 2015’(link is external). In particular we will use the Java8 option (available on update 60 build 19 and higher) -XX:+PreserveFramePointer.
In our architecture, we make use of perf-map-agent(link is external). This is an agent that will generate such a mapping for Java applications. When the agent is attached to a running Java application it instructs the JVM to report code blobs generated by the JVM at runtime for various purposes. This also includes JIT-compiled methods and other meta-data. However, this approach has its limitations, i.e., by default, the JIT compiler will ignore to inline the information of small methods. This could prevent the profiler from obtaining valuable information for identifying the correct code path. The solution we used for solving this, as obtained from the literature, is to specify another JVM option (-XX:InlineSmallCode=n). Using this option, we tell the JVM to inline the meta-information of the compiled method (think of the method name and class) and dispatch the information to an attached Java agent, which in turn, can write the information to a system table, as discussed above.
Since this blogpost is mostly dedicated to how we resolved the issue with Sqoop, we will not further discuss how we build the profiler.