You Are Here: Home » Apache Spark

A Technical Overview of Azure Databricks – The Databricks Blog

Today at Microsoft Connect(); we introduced Azure Databricks, an exciting new service in preview that brings together the best of the Apache Spark analytics platform and Azure cloud. As a close partnership between Databricks and Microsoft, Azure Databricks brings unique benefits not present in other cloud platforms. This blog post introduces the technology and new capabilities available for data scientists, ...

Read more

Apache Kafka and the four challenges of production machine learning systems – O’Reilly Media

Machine learning has become mainstream, and suddenly businesses everywhere are looking to build systems that use it to optimize aspects of their product, processes or customer experience. The cartoon version of machine learning sounds quite easy: you feed in training data made up of examples of good and bad outcomes, and the computer automatically learns from these and spits out a model that can make simila ...

Read more

Running Streaming Jobs Once a Day For 10x Cost Savings – The Databricks Blog

This is the sixth post in a multi-part series about how you can perform complex streaming analytics using Apache Spark. Traditionally, when people think about streaming, terms such as “real-time,” “24/7,” or “always on” come to mind. You may have cases where data only arrives at fixed intervals. That is, data appears every hour or once a day. For these use cases, it is still beneficial to perform incrementa ...

Read more

Announcing real-time Geospatial Analytics in Azure Stream Analytics | Blog | Microsoft Azure

We recently announced the general availability of Geospatial Functions in Azure Stream Analytics to enable real-time analytics on streaming geospatial data. This will make it possible to realize scenarios such as fleet monitoring, asset tracking, geofencing, phone tracking across cell sites, connected manufacturing, ridesharing solutions, etc. with production grade quality with a few lines of code. The conn ...

Read more

4 challenges Artificial Intelligence must address

4 challenges Artificial Intelligence must address If news, polls and investment figures are any indication, Artificial Intelligence and Machine Learning will soon become an inherent part of everything we do in our daily lives. Backing up the argument are a slew of innovations and breakthroughs that have brought the power and efficiency of AI into various fields including medicine, shopping, finance, news, f ...

Read more

Working with UDFs in Apache Spark – Cloudera Engineering Blog

User-defined functions (UDFs) are a key feature of most SQL environments to extend the system’s built-in functionality.  UDFs allow developers to enable new functions in higher level languages such as SQL by abstracting their lower level language implementations.  Apache Spark is no exception, and offers a wide range of options for integrating UDFs with Spark SQL workflows. In this blog post, we’ll review s ...

Read more

Yahoo supercharges TensorFlow with Apache Spark 

Yahoo, model Apache Spark citizen and developer of CaffeOnSpark, which made it easier for developers building deep learning models in Caffe to scale with parallel processing, is open sourcing a new project called TensorFlowOnSpark. The pairing of Spark and TensorFlow should make the deep learning framework more attractive to developers who are creating models that need to run on large computing clusters. Fo ...

Read more

Distributed Deep Learning with Apache Spark and Keras | Databases at CERN

In the following blog posts we study the topic of Distributed Deep Learning, or rather, how to parallelize gradient descent using data parallel methods. We start by laying out the theory, while supplying you with some intuition into the techniques we applied. At the end of this blog post, we conduct some experiments to evaluate how different optimization schemes perform in identical situations. We also intr ...

Read more

Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning | AWS Big Data Blog

Air travel can be stressful due to the many factors that are simply out of airline passengers’ control. As passengers, we want to minimize this stress as much as we can. We can do this by using past data to make predictions about how likely a flight will be delayed based on the time of day or the airline carrier. In this post, we generate a predictive model for flight delays that can be used to help us pick ...

Read more

Playing with 80 Million Amazon Product Review Ratings Using Apache Spark

Amazon product reviews and ratings are a very important business. Customers on Amazon often make purchasing decisions based on those reviews, and a single bad review can cause a potential purchaser to reconsider. A couple years ago, I wrote a blog post titled A Statistical Analysis of 1.2 Million Amazon Reviews, which was well-received. Back then, I was only limited to 1.2M reviews because attempting to pro ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top