You Are Here: Home » Hadoop

How-to: Fuzzy Name Indexing in Apache Hadoop with Rosette and Cloudera Search – Cloudera Engineering Blog

In this guide, learn how to use Cloudera Search with Basis Technology’s Rosette®  to perform fuzzy name searches in multiple languages and scripts. Our thanks to Basis Technology team (Jeanne Le Garrec, Hannah MacKenzie-Margulies and Brian Sawyer) for supporting writing this how-to blog. Cloudera Search, powered by Apache Solr brings full-text, interactive search, and scalable indexing to Apache Hadoop by m ...

Read more

Data Wrangling at Slack

For a company like Slack that strives to be as data-driven as possible, understanding how our users use our product is essential. The Data Engineering team at Slack works to provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better and data informed decisions: “Based on a team’s activity within its first week, what is the probability that it ...

Read more

Skool: An Open Source Data Integration Tool for Apache Hadoop from BT Group – Cloudera Engineering Blog

In this guest post, Skool’s architects at BT Group explain its origins, design, and functionality. With increased adoption of big data comes the challenge of integrating existing data sitting in various relational and file-based systems with Apache Hadoop infrastructure. Although open source connectors (such as Apache Sqoop) and utilities (such as Httpfs/Curl on Linux) make it easy to exchange data, data en ...

Read more

Hadoop Still Beats Spark In These Cases | Bowen Gong | Pulse | LinkedIn

Summary There has been many talks about Spark replacing Hadoop in the big data space due to its speed and ease of use. While there are major benefits of using Spark (I am one of its advocates), it is far from a replacement for Hadoop for two reasons. One, Spark does not have the HDFS component. Two, Spark is not more scalable or fault-tolerant than hadoop. Spark Strengths Although this article is to show yo ...

Read more

HBase: The database big data left behind | InfoWorld

As the default database for Hadoop, you'd expect HBase to be more popular than it is, but its time may already have passed A few years ago, HBase looked set to become one of the dominant databases in big data. The primary pairing for Hadoop, HBase saw adoption skyrocket, but it has since plateaued, especially compared to NoSQL peers MongoDB, Cassandra, and Redis, as measured by general database popularity. ...

Read more

Hadoop performance troubleshooting with stack tracing, an introduction. | Databases at CERN

This post is about profiling and performance tuning of distributed workloads and in particular Hadoop applications. You will learn of a profiler application we have developed and how it has successfully been applied to tuning Sqoop to improve the throughput of data transfer from Oracle to Hadoop. Where is my Sqoop job spending CPU time? One of the data feeds into our Hadoop service is from Oracle databases. ...

Read more

Why Ford And Microsoft Are Betting On Pivotal Software At A $2.8 Billion Valuation

Ford wants to be known for mobility as much as its cars, and it’s willing to write software companies outsized checks to prove it. Ford announced Thursday it had led a $253 million investment in Pivotal Software, joined by Microsoft MSFT -0.60%, in a deal that values the EMC EMC +1.59%and VMware VMW +0.05% spin-out at $2.8 billion. The investment is part of a broader strategy for Ford to invest in its mobil ...

Read more

Tom Siebel’s C3 IoT looks to expand, slay giants

Tom Siebel's C3 IoT has 20 customers, an Internet of things platform that is operating at scale and a penchant for taking on giants such as General Electric's Predix. Siebel, CEO of C3 IoT, has experience landing big accounts and taking on giants. At Siebel Systems, Siebel popularized CRM and then sold his company to Oracle. Before starting that effort, Siebel was among Oracle's best sales leaders. Those st ...

Read more

Data Warehousing With Google BigQuery

Data warehousing and the resulting business intelligence are the basic necessities of business today. And today’s technologies makes it possible to have a sophisticated data warehouse up and running in the clouds at a price and scale that was never possible before.     This webinar showcases the reasons, ways and means of developing such modern day data warehouses using Google BigQuery.   ...

Read more

Open Sourcing Dr. Elephant: Self-Serve Performance Tuning for Hadoop and Spark | LinkedIn Engineering

We are proud to announce today that we are open sourcing Dr. Elephant, a powerful tool that helps users of Hadoop and Spark understand, analyze, and improve the performance of their flows. We first presented Dr. Elephant to the community last year during the eighth annual Hadoop Summit, a leading conference for the Apache Hadoop community. Our Motivation Hadoop is a framework that facilitates the distribute ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top