You Are Here: Home » Data Warehousing

Google Spanner: Beginning of the End of the NoSQL World? – ACM SIGMOD Blog

Google has recently announced that its flagship wide-area database named Spanner has been made available on the Google Cloud. Google Spanner is the next generation globally-distributed database built inside Google and announced to the world through the paper published in OSDI 2012 [1]. This article explores the implication of Google Spanner, in particular to the NoSQL world. CAP Theorem: A Quick Recap The t ...

Read more

The Seven Essentials of AI-Based Predictive Selling Having a complete 360 degree view of each customer is imperative for predictive sales success. Where does the data for this comprehensive customer profile come from? And when should you start creating your profiles? Last week, we looked at what AI-based predictive selling, also known as predictive sales, is doing right now. It’s making sales teams more eff ...

Read more

Stop overdoing it when cleaning your big data – TechRepublic

Stop overdoing it when cleaning your big dataEnough is enough--your big data might actually be getting too clean. Find out why it can be useful to keep bad, garbage data.When you got a job as a data scientist, I bet you didn't imagine you'd spend so much time cleaning up bad data. Don't feel badly—none of us did.When data science rolled on the scene, many of us who were already in the data warehousing and b ...

Read more

Working with UDFs in Apache Spark – Cloudera Engineering Blog

User-defined functions (UDFs) are a key feature of most SQL environments to extend the system’s built-in functionality.  UDFs allow developers to enable new functions in higher level languages such as SQL by abstracting their lower level language implementations.  Apache Spark is no exception, and offers a wide range of options for integrating UDFs with Spark SQL workflows. In this blog post, we’ll review s ...

Read more

Nebula as a Storage Platform to Build Airbnb’s Search Backends – Airbnb Engineering & Data Science – Medium

Last year Airbnb grew to a point that a scalable and distributed storage system was required to store data for some applications. For example, personalization data for search grew larger than what a single machine can hold. While we could rebuild just the personalization service to scale up, we foresaw other services to have similar requirements and decided to build a common platform to simplify such tasks ...

Read more

Data Wrangling at Slack

For a company like Slack that strives to be as data-driven as possible, understanding how our users use our product is essential. The Data Engineering team at Slack works to provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better and data informed decisions: “Based on a team’s activity within its first week, what is the probability that it ...

Read more

Uber’s case for incremental processing on Hadoop – O’Reilly Media

Uber’s mission is to provide “transportation as reliable as running water, everywhere, for everyone.” To fulfill this promise, Uber relies on making data-driven decisions at every level, and most of these decisions can benefit from faster data processing. For example, using data to understand areas for growth or accessing of fresh data by the city operations team to debug each city. Needless to say, the cho ...

Read more

JOIN Amazon Redshift AND Amazon RDS PostgreSQL WITH dblink – AWS Big Data Blog

When it comes to choosing a SQL-based database in AWS, there are many options. Sometimes it can be difficult to know which one to choose. For example, when would you use Amazon Aurora instead of Amazon RDS PostgreSQL or Amazon Redshift? To answer this question, you must first understand the nature of the data workload and then evaluate other factors such as the quantity of data and query access patterns. Th ...

Read more

HBase: The database big data left behind | InfoWorld

As the default database for Hadoop, you'd expect HBase to be more popular than it is, but its time may already have passed A few years ago, HBase looked set to become one of the dominant databases in big data. The primary pairing for Hadoop, HBase saw adoption skyrocket, but it has since plateaued, especially compared to NoSQL peers MongoDB, Cassandra, and Redis, as measured by general database popularity. ...

Read more

Tom Siebel’s C3 IoT looks to expand, slay giants

Tom Siebel's C3 IoT has 20 customers, an Internet of things platform that is operating at scale and a penchant for taking on giants such as General Electric's Predix. Siebel, CEO of C3 IoT, has experience landing big accounts and taking on giants. At Siebel Systems, Siebel popularized CRM and then sold his company to Oracle. Before starting that effort, Siebel was among Oracle's best sales leaders. Those st ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top