You Are Here: Home » SQL

Google’s new cloud service eases data preparation for machine learning | Computerworld

Google's new cloud service eases data preparation for machine learning BigQuery gets a bunch of updates for big data, too One of the challenges that data scientists face when running machine learning workloads is processing information before it’s ready for use. Google unveiled a new cloud service Thursday aimed at easing that pain.Google Cloud Dataprep will automatically detect data schemas, joins, and ano ...

Read more

Working with UDFs in Apache Spark – Cloudera Engineering Blog

User-defined functions (UDFs) are a key feature of most SQL environments to extend the system’s built-in functionality.  UDFs allow developers to enable new functions in higher level languages such as SQL by abstracting their lower level language implementations.  Apache Spark is no exception, and offers a wide range of options for integrating UDFs with Spark SQL workflows. In this blog post, we’ll review s ...

Read more

Apache Impala (Incubating) on Amazon: Performance and Cost Considerations for S3 vs. EBS – Cloudera Engineering Blog

The benchmark testing results detailed below can help you make an informed decision about AWS storage options for Impala. In a recent post, you learned how Impala 2.6 on S3 delivers cloud-native features unmatched by other analytic databases in the cloud. With support to read/write data from Amazon S3, Impala provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, an ...

Read more

Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 2 – AWS Big Data Blog

Amazon Kinesis Analytics allows you to easily write SQL ­­­on streaming data, providing a powerful way to build a stream processing application in minutes. The service allows you to connect to streaming data sources, process the data with sub-second latencies, and continuously emit results to downstream destinations for use in real-time alerts, dashboards, or further analysis. This post introduces you to th ...

Read more

Uber’s case for incremental processing on Hadoop – O’Reilly Media

Uber’s mission is to provide “transportation as reliable as running water, everywhere, for everyone.” To fulfill this promise, Uber relies on making data-driven decisions at every level, and most of these decisions can benefit from faster data processing. For example, using data to understand areas for growth or accessing of fresh data by the city operations team to debug each city. Needless to say, the cho ...

Read more

How-to: Analyze Fantasy Sports with Apache Spark and SQL (Part 2: Data Exploration) – Cloudera Engineering Blog

Learn how analyzing stats from professional sports leagues is an instructive use case for data analytics using Apache Spark with SQL. Covered in this installment: data exploration with Apache Impala (incubating) and Hue. In Part 1 of this series, I introduced the topic of using fantasy sports analytics as an instructive use case for exploring the Apache Hadoop ecosystem. In that installment, we focused on d ...

Read more

How-to: Analyze Fantasy Sports using Apache Spark and SQL – Cloudera Engineering Blog

In the United States, many diehard sports fans morph into amateur statisticians to get an edge over the competition in their fantasy sports leagues. Depending on one’s technical chops, this “edge” is usually no more sophisticated than simple spreadsheet analysis, but some particularly intense people go to the extent of creating their own player rankings and projection systems. Online tools can provide simil ...

Read more

Supercharge SQL on Your Data in Apache HBase with Apache Phoenix – AWS Big Data Blog

With today’s launch of Amazon EMR release 4.7, you can now create clusters with Apache Phoenix 4.7.0 for low-latency SQL and OLTP workloads. Phoenix uses Apache HBase as its backing store (HBase 1.2.1 is included on Amazon EMR release 4.7.0), using HBase scan operations and coprocessors for fast performance. Additionally, you can map Phoenix tables and views to existing HBase tables, giving you SQL access o ...

Read more

Apache Spark powers live SQL analytics in SnappyData | InfoWorld

The team behind Pivotal's GemFire in-memory transactional data store recently unveiled a new database solution powered by GemFire and Apache Spark, called SnappyData. SnappyData is another recent example of Spark employed as a component in a larger database solution, with or without other pieces from Apache Hadoop. SnappyData -- the name of both the new database and the organization producing it -- was buil ...

Read more

MemSQL raises $36M Series C round for its in-memory database platform | TechCrunch

In-memory database platform MemSQL today announced that it has raised a $36 million Series C funding round. The round was led by new investors REV and Caffeinated Capital and existing investors Accel Partners, Khosla Ventures, Data Collective, IA Ventures and First Round Capital also participated. MemSQL, which graduated from Y Combinator back in 2011, plays in the same real-time big data analysis market th ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top