You Are Here: Home » Data Mining

How FICO scores big with cloud-based collaboration and data solutions – Microsoft Enterprise

Data rules everything around us. From traffic lights to medical records, those ones and zeros quickly dictate everything from the ads we see to the music we hear. As more and more organizations adopt big data strategies, we as consumers see exciting new innovations and solutions that leverage these possibilities. Smart phones, driver-less cars—this wave of data analytics brings the future to life in excitin ...

Read more

Working with UDFs in Apache Spark – Cloudera Engineering Blog

User-defined functions (UDFs) are a key feature of most SQL environments to extend the system’s built-in functionality.  UDFs allow developers to enable new functions in higher level languages such as SQL by abstracting their lower level language implementations.  Apache Spark is no exception, and offers a wide range of options for integrating UDFs with Spark SQL workflows. In this blog post, we’ll review s ...

Read more

Ethics — the next frontier for artificial intelligence | TechCrunch

AI’s next frontier requires ethics built through policy. Will Donald Trump deliver? With one foot in its science fiction past and the other in the new frontier of science and tech innovations, AI occupies a unique place in our cultural imagination. Will we live into a future where machines are as intelligent — or frighteningly, more so — than humans? We have already witnessed AI predict the outcome of the l ...

Read more

This company is using Amazon Snowmobile to transfer petabytes of data to the cloud

One of the most dramatic announcements from Amazon Web Services at its 2016 re:Invent conference was the announcement of Snowmobile: It’s a 45’ semi truck that trailers a data center on wheels. Customers can load it up with up to 100 petabytes of data per Snowmobile, which is then driven to an AWS data center and loaded into the company’s cloud. It begs the question: Who’s actually using this? DigitalGlobe ...

Read more

Nebula as a Storage Platform to Build Airbnb’s Search Backends – Airbnb Engineering & Data Science – Medium

Last year Airbnb grew to a point that a scalable and distributed storage system was required to store data for some applications. For example, personalization data for search grew larger than what a single machine can hold. While we could rebuild just the personalization service to scale up, we foresaw other services to have similar requirements and decided to build a common platform to simplify such tasks ...

Read more

Apache Impala (incubating) vs. Amazon Redshift: S3 Integration, Elasticity, Agility, and Cost-Performance Benefits on AWS – Cloudera Engineering Blog

As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases. Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databas ...

Read more

Apache Kudu 1.0 is Released – Cloudera VISION

This week, the Apache Kudu team announced the release of Kudu 1.0. This release marks the one-year anniversary of Kudu’s public debut, and is the culmination of much hard work by a growing team of developers and community members. In this blog post, I’ll recap the original vision for Kudu, review our accomplishments over the last year, and share where I see the project going in the future. The Origins of Ku ...

Read more

Skool: An Open Source Data Integration Tool for Apache Hadoop from BT Group – Cloudera Engineering Blog

In this guest post, Skool’s architects at BT Group explain its origins, design, and functionality. With increased adoption of big data comes the challenge of integrating existing data sitting in various relational and file-based systems with Apache Hadoop infrastructure. Although open source connectors (such as Apache Sqoop) and utilities (such as Httpfs/Curl on Linux) make it easy to exchange data, data en ...

Read more

Hadoop Still Beats Spark In These Cases | Bowen Gong | Pulse | LinkedIn

Summary There has been many talks about Spark replacing Hadoop in the big data space due to its speed and ease of use. While there are major benefits of using Spark (I am one of its advocates), it is far from a replacement for Hadoop for two reasons. One, Spark does not have the HDFS component. Two, Spark is not more scalable or fault-tolerant than hadoop. Spark Strengths Although this article is to show yo ...

Read more

JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink – AWS Big Data Blog

When it comes to choosing a SQL-based database in AWS, there are many options. Sometimes it can be difficult to know which one to choose. For example, when would you use Amazon Aurora instead of Amazon RDS PostgreSQL or Amazon Redshift? To answer this question, you must first understand the nature of the data workload and then evaluate other factors such as the quantity of data and query access patterns. Th ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top