Unite Real-Time and Batch Analytics Using the Big Data Lambda Architecture, Without Servers! | AWS Big Data Blog
The Big Data Lambda Architecture seeks to provide data engineers and architects with a scalable, fault-tolerant data processing architecture and framework using loosely coupled, distributed systems. At a high level, the Lambda Architecture is designed to handle both real-time and historically aggregated batched data in an integrated fashion. It separates the duties of real-time and batch processing so purpose-built engines, processes, and storage can be used for each, while serving and query layers present a unified view of all of the data.
Historically, the Lambda Architecture demanded the use of various complex systems to achieve the outcomes of uniting batch and real-time views. Data platform engineers and architects were required to implement services running on Amazon EC2 for data collection and ingestion, batch processing, stream processing, serving layers, and dashboards/reporting. As time has gone on, AWS customers have continued to ask for managed solutions that scale seamlessly and put less focus on infrastructure, allowing teams to focus on what really matters: the data and the resulting insights.
In this post, I show you how you can use AWS services like AWS Glue to build a Lambda Architecture completely without servers. I use a practical demonstration to examine the tight integration between serverless services on AWS and create a robust data processing Lambda Architecture system.