Serverless Scaling for Ingesting, Aggregating, and Visualizing Apache Logs with Amazon Kinesis Firehose, AWS Lambda, and Amazon Elasticsearch Service | AWS Database Blog
In 2016, AWS introduced the EKK stack (Amazon Elasticsearch Service, Amazon Kinesis, and Kibana, an open source plugin from Elastic) as an alternative to ELK (Amazon Elasticsearch Service, the open source tool Logstash, and Kibana) for ingesting and visualizing Apache logs. One of the main features of the EKK stack is that the data transformation is handled via the Amazon Kinesis Firehose agent. In this post, we describe how to optimize the EKK solution—by handling the data transformation in Amazon Kinesis Firehose through AWS Lambda.
In the ELK stack, the Logstash cluster handles the parsing of the Apache logs. However, the Logstash cluster must be designed and maintained for scale management. This type of server management requires a lot of heavy lifting on the user’s part. The EKK solution eliminates this work with Amazon Kinesis Firehose, AWS Lambda, and Amazon Elasticsearch Service (Amazon ES).
Let’s look at the components and architecture of the EKK optimized solution.
Amazon Kinesis Firehose
Amazon Kinesis Firehose provides the easiest way to load streaming data into AWS. In this solution, Firehose helps capture and automatically load the streaming log data to Amazon ES, and backs it up in Amazon S3. For more information about Firehose, see What is Amazon Kinesis Firehose?
AWS Lambda lets you run code without provisioning or managing servers. It automatically scales your application by running code in response to each trigger. Your code runs in parallel and processes each trigger individually, scaling precisely with the size of the workload. In the EKK solution, Amazon Kinesis Firehose invokes the Lambda function to transform incoming source data and deliver the transformed data to the managed Amazon ES cluster. For more information about AWS Lambda, see the AWS Lambda documentation.
Amazon Elasticsearch Service
Amazon ES is a popular search and analytics engine that provides real-time application monitoring and log and clickstream analytics. In this solution, the Apache logs are stored and indexed in Amazon ES. As a managed service, Amazon ES is easy to deploy, operate, and scale in the AWS Cloud. Using a managed service eliminates administrative overhead, including patch management, failure detection, node replacement, backups, and monitoring. Because Amazon ES includes built-in integration with Kibana, it eliminates having to install and configure that platform—simplifying your process even more. For information about Amazon ES, see What Is Amazon Elasticsearch Service?
Amazon Kinesis Data Generator
This solution uses the Amazon Kinesis Data Generator (KDG) to produce the Apache access logs. The KDG makes it easy to simulate Apache access logs and demonstrate the processing pipeline and scalability of the solution.
The following diagram shows the architecture of the EKK optimized stack.