Multi Cluster Hadoop Job Monitoring
I spend lot of time tracking and monitoring Hadoop jobs running across multiple clusters in my current project. Typically I navigate around multiple Job tracker web admin consoles. Although the job tracker web console gives some basic system level statuses and metrics for Hadoop daemons, it leaves a lot to be desired. What’s missing is a monitoring platform at the application level.
In my Hadoop job I may have a counter, that gets incremented when certain kind of exception gets thrown. I may want to see the counter value in a dashboard and also an alarm to be raised, when the value exceeds a threshold. Some other uses for counters for error conditions are invalid data and missing data.
Another serious limitations of the job tracker console is the maximum number of jobs that are retained. Older jobs simply disappear from the list of jobs in the job tracker console.
- Follow this posting on Mawazo…
- Find other postings from Mawazo…
