You Are Here: Home » Data Mining

Google’s new cloud service eases data preparation for machine learning | Computerworld

Google's new cloud service eases data preparation for machine learning BigQuery gets a bunch of updates for big data, too One of the challenges that data scientists face when running machine learning workloads is processing information before it’s ready for use. Google unveiled a new cloud service Thursday aimed at easing that pain.Google Cloud Dataprep will automatically detect data schemas, joins, and ano ...

Read more

Solar, wind, storage and big data: Why energy may soon be free : Renew Economy

Solar, wind, storage and big data: Why energy may soon be free By Giles Parkinson on 1 August 2016 Global investment bank Citi is predicting that the combination of near zero-variable cost energy sources such as solar and wind, along with smart analytics and “big data”, may deliver what the nuclear industry promised nearly half a century ago – free energy. “The notion of free energy came to prominence in th ...

Read more

Stop overdoing it when cleaning your big data – TechRepublic

Stop overdoing it when cleaning your big dataEnough is enough--your big data might actually be getting too clean. Find out why it can be useful to keep bad, garbage data.When you got a job as a data scientist, I bet you didn't imagine you'd spend so much time cleaning up bad data. Don't feel badly—none of us did.When data science rolled on the scene, many of us who were already in the data warehousing and b ...

Read more

How FICO scores big with cloud-based collaboration and data solutions – Microsoft Enterprise

Data rules everything around us. From traffic lights to medical records, those ones and zeros quickly dictate everything from the ads we see to the music we hear. As more and more organizations adopt big data strategies, we as consumers see exciting new innovations and solutions that leverage these possibilities. Smart phones, driver-less cars—this wave of data analytics brings the future to life in excitin ...

Read more

Working with UDFs in Apache Spark – Cloudera Engineering Blog

User-defined functions (UDFs) are a key feature of most SQL environments to extend the system’s built-in functionality.  UDFs allow developers to enable new functions in higher level languages such as SQL by abstracting their lower level language implementations.  Apache Spark is no exception, and offers a wide range of options for integrating UDFs with Spark SQL workflows. In this blog post, we’ll review s ...

Read more

Ethics — the next frontier for artificial intelligence | TechCrunch

AI’s next frontier requires ethics built through policy. Will Donald Trump deliver? With one foot in its science fiction past and the other in the new frontier of science and tech innovations, AI occupies a unique place in our cultural imagination. Will we live into a future where machines are as intelligent — or frighteningly, more so — than humans? We have already witnessed AI predict the outcome of the l ...

Read more

This company is using Amazon Snowmobile to transfer petabytes of data to the cloud

One of the most dramatic announcements from Amazon Web Services at its 2016 re:Invent conference was the announcement of Snowmobile: It’s a 45’ semi truck that trailers a data center on wheels. Customers can load it up with up to 100 petabytes of data per Snowmobile, which is then driven to an AWS data center and loaded into the company’s cloud. It begs the question: Who’s actually using this? DigitalGlobe ...

Read more

Nebula as a Storage Platform to Build Airbnb’s Search Backends – Airbnb Engineering & Data Science – Medium

Last year Airbnb grew to a point that a scalable and distributed storage system was required to store data for some applications. For example, personalization data for search grew larger than what a single machine can hold. While we could rebuild just the personalization service to scale up, we foresaw other services to have similar requirements and decided to build a common platform to simplify such tasks ...

Read more

Apache Impala (incubating) vs. Amazon Redshift: S3 Integration, Elasticity, Agility, and Cost-Performance Benefits on AWS – Cloudera Engineering Blog

As measured across multiple dimensions (see analysis below), Impala provides a better cloud-native experience than Redshift for a number of common use cases. Impala 2.6 brings read/write support on Amazon S3, which provides cloud capabilities such as direct querying of data from S3, elastic scaling of compute, and seamless data portability and flexibility that are unique amongst cloud-based analytic databas ...

Read more

Apache Kudu 1.0 is Released – Cloudera VISION

This week, the Apache Kudu team announced the release of Kudu 1.0. This release marks the one-year anniversary of Kudu’s public debut, and is the culmination of much hard work by a growing team of developers and community members. In this blog post, I’ll recap the original vision for Kudu, review our accomplishments over the last year, and share where I see the project going in the future. The Origins of Ku ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top