You Are Here: Home » Hadoop

Introducing Apache Arrow: A Fast, Interoperable In-Memory Columnar Data Structure Standard – Cloudera Engineering Blog

Engineers from across the Apache Hadoop community are collaborating to establish Arrow as a de-facto standard for columnar in-memory processing and interchange. Here’s how it works. Apache Arrow is an in-memory data structure specification for use by engineers building data systems. It has several key benefits: A columnar memory-layout permitting O(1) random access. The layout is highly cache-efficient in a ...

Read more

Global Hadoop Market will reach USD 87.14 billion by 2022

Global Hadoop Market will reach USD 87.14 billion by 2022: Zion Market Research According to the report, the global Hadoop market was valued at approximately USD 7.69 billion in 2016 and is expected to reach approximately USD 87.14 billion by 2022, growing at a CAGR of around 50% between 2017 and 2022. Sarasota, FL, Feb. 15, 2017 (GLOBE NEWSWIRE) -- Zion Market Research has published a new report titled “Ha ...

Read more

Stop overdoing it when cleaning your big data – TechRepublic

Stop overdoing it when cleaning your big dataEnough is enough--your big data might actually be getting too clean. Find out why it can be useful to keep bad, garbage data.When you got a job as a data scientist, I bet you didn't imagine you'd spend so much time cleaning up bad data. Don't feel badly—none of us did.When data science rolled on the scene, many of us who were already in the data warehousing and b ...

Read more

How-to: Fuzzy Name Indexing in Apache Hadoop with Rosette and Cloudera Search – Cloudera Engineering Blog

In this guide, learn how to use Cloudera Search with Basis Technology’s Rosette®  to perform fuzzy name searches in multiple languages and scripts. Our thanks to Basis Technology team (Jeanne Le Garrec, Hannah MacKenzie-Margulies and Brian Sawyer) for supporting writing this how-to blog. Cloudera Search, powered by Apache Solr brings full-text, interactive search, and scalable indexing to Apache Hadoop by m ...

Read more

Data Wrangling at Slack

For a company like Slack that strives to be as data-driven as possible, understanding how our users use our product is essential. The Data Engineering team at Slack works to provide an ecosystem to help people in the company quickly and easily answer questions about usage, so they can make better and data informed decisions: “Based on a team’s activity within its first week, what is the probability that it ...

Read more

Skool: An Open Source Data Integration Tool for Apache Hadoop from BT Group – Cloudera Engineering Blog

In this guest post, Skool’s architects at BT Group explain its origins, design, and functionality. With increased adoption of big data comes the challenge of integrating existing data sitting in various relational and file-based systems with Apache Hadoop infrastructure. Although open source connectors (such as Apache Sqoop) and utilities (such as Httpfs/Curl on Linux) make it easy to exchange data, data en ...

Read more

Hadoop Still Beats Spark In These Cases | Bowen Gong | Pulse | LinkedIn

Summary There has been many talks about Spark replacing Hadoop in the big data space due to its speed and ease of use. While there are major benefits of using Spark (I am one of its advocates), it is far from a replacement for Hadoop for two reasons. One, Spark does not have the HDFS component. Two, Spark is not more scalable or fault-tolerant than hadoop. Spark Strengths Although this article is to show yo ...

Read more

HBase: The database big data left behind | InfoWorld

As the default database for Hadoop, you'd expect HBase to be more popular than it is, but its time may already have passed A few years ago, HBase looked set to become one of the dominant databases in big data. The primary pairing for Hadoop, HBase saw adoption skyrocket, but it has since plateaued, especially compared to NoSQL peers MongoDB, Cassandra, and Redis, as measured by general database popularity. ...

Read more

Hadoop performance troubleshooting with stack tracing, an introduction. | Databases at CERN

This post is about profiling and performance tuning of distributed workloads and in particular Hadoop applications. You will learn of a profiler application we have developed and how it has successfully been applied to tuning Sqoop to improve the throughput of data transfer from Oracle to Hadoop. Where is my Sqoop job spending CPU time? One of the data feeds into our Hadoop service is from Oracle databases. ...

Read more

Why Ford And Microsoft Are Betting On Pivotal Software At A $2.8 Billion Valuation

Ford wants to be known for mobility as much as its cars, and it’s willing to write software companies outsized checks to prove it. Ford announced Thursday it had led a $253 million investment in Pivotal Software, joined by Microsoft MSFT -0.60%, in a deal that values the EMC EMC +1.59%and VMware VMW +0.05% spin-out at $2.8 billion. The investment is part of a broader strategy for Ford to invest in its mobil ...

Read more

2015 © Big Data Cloud Inc. All Rights Reserved.

Hadoop and the Hadoop elephant logo, Sprark are trademarks of the Apache Software Foundation.

Scroll to top