You Are Here: Home » Amazon EC2

Implement Serverless Log Analytics Using Amazon Kinesis Analytics | AWS Big Data Blog

Applications log a large amount of data that—when analyzed in real time—provides significant insight into your applications. Real-time log analysis can be used to ensure security compliance, troubleshoot operation events, identify application usage patterns, and much more. Ingesting and analyzing this data in real time can be accomplished by using a variety of open source tools on Amazon EC2. Alternatively, ...

Read more

Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning | AWS Big Data Blog

Air travel can be stressful due to the many factors that are simply out of airline passengers’ control. As passengers, we want to minimize this stress as much as we can. We can do this by using past data to make predictions about how likely a flight will be delayed based on the time of day or the airline carrier. In this post, we generate a predictive model for flight delays that can be used to help us pick ...

Read more

This company is using Amazon Snowmobile to transfer petabytes of data to the cloud

One of the most dramatic announcements from Amazon Web Services at its 2016 re:Invent conference was the announcement of Snowmobile: It’s a 45’ semi truck that trailers a data center on wheels. Customers can load it up with up to 100 petabytes of data per Snowmobile, which is then driven to an AWS data center and loaded into the company’s cloud. It begs the question: Who’s actually using this? DigitalGlobe ...

Read more

Data Wrangling at Slack

For a company like Slack that strives to be as data-driven as possible, understanding how our users use our product is essential. The Data Engineering team at Slack works to provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better and data informed decisions: “Based on a team’s activity within its first week, what is the probability that it ...

Read more

Low-Latency Access on Trillions of Records: FINRA’s Architecture Using Apache HBase on Amazon EMR with Amazon S3 | AWS Big Data Blog

The Financial Industry Regulatory Authority (FINRA) is a private sector regulator responsible for analyzing 99% of the equities and 65% of the option activity in the US. In order to look for fraud, market manipulation, insider trading, and abuse, FINRA’s technology group has developed a robust set of big data tools in the AWS Cloud to support these activities. One particular application, which requires low- ...

Read more

Apache Impala (incubating) vs. Amazon Redshift: S3 Integration, Elasticity, Agility, and Cost-Performance Benefits on AWS – Cloudera Engineering Blog

As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases. Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databas ...

Read more

Encrypt Data At-Rest and In-Flight on Amazon EMR with Security Configurations

Customers running analytics, stream processing, machine learning, and ETL workloads on personally identifiable information, health information, and financial data have strict requirements for encryption of data at-rest and in-transit. The Apache Spark and Hadoop ecosystems lend themselves to these big data use cases, and customers have asked us to provide a quick and easy way to encrypt data at-rest and dat ...

Read more

Supercharge SQL on Your Data in Apache HBase with Apache Phoenix – AWS Big Data Blog

With today’s launch of Amazon EMR release 4.7, you can now create clusters with Apache Phoenix 4.7.0 for low-latency SQL and OLTP workloads. Phoenix uses Apache HBase as its backing store (HBase 1.2.1 is included on Amazon EMR release 4.7.0), using HBase scan operations and coprocessors for fast performance. Additionally, you can map Phoenix tables and views to existing HBase tables, giving you SQL access o ...

Read more

JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink – AWS Big Data Blog

When it comes to choosing a SQL-based database in AWS, there are many options. Sometimes it can be difficult to know which one to choose. For example, when would you use Amazon Aurora instead of Amazon RDS PostgreSQL or Amazon Redshift? To answer this question, you must first understand the nature of the data workload and then evaluate other factors such as the quantity of data and query access patterns. Th ...

Read more

Distributed Deep Learning with Caffe Using a MapR Cluster | MapR

We have experimented with CaffeOnSpark on a 5 node MapR 5.1 cluster running Spark 1.5.2 and will share our experience, difficulties, and solutions on this blog post. Deep Learning and Caffe Deep learning is getting a lot of attention recently, with AlphaGo beating a top world  player at a game that was thought so complicated as to be out of reach of computers just five years ago. Deep learning is not just b ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top