[WEBINAR] The New Normal: Dealing with the Reality of an Unsecure World

How can I learn to use Hadoop to analyze big data?


How can I learn to use Hadoop to analyze big data?


The Apache software set known as Hadoop is becoming a very popular resource for dealing with big data sets. This type of data handling software framework was built in order to help aggregate data in specific ways, based on designs that may make some kinds of data projects more efficient. That said, Hadoop is only one of many tools for handling large data sets.

One of the first and most basic ways to learn about big data analysis with Hadoop is to understand some of the top-level components of Hadoop and what it does. These include a Hadoop YARN "resource management platform" that can be applied to certain kinds of network setups, as well as a Hadoop MapReduce set of functions that apply to big data sets. There’s also a Hadoop distributed file system (HDFS), which helps to store data across distributed systems so that it can be quickly and efficiently indexed or retrieved.

Beyond this, those who want to become more familiar with Hadoop can look at individual published resources for professionals who explain the software on a relatable level. This example from Chris Stucchio at a personal blog provides an excellent set of points about Hadoop and data scale. One of the basic takeaways is that Hadoop may be more commonly used than is necessary, and may not be the best solution for an individual project. Reviewing these kinds of resources will help professionals become more familiar with the details of using Hadoop in any given scenario. Stucchio also provides metaphors for relating Hadoop's functions to specific physical tasks. Here, the example is counting the number of books in a library, whereas a Hadoop function might break that library up into sections, providing individual counts that are blended into one aggregate data result.

A more in-depth way that professionals can learn more about Hadoop and its application to big data is through specific training resources and programs. For example, the online learning company Cloudera, a prominent provider of remote training sessions, has a number of interesting options around Hadoop use and similar types of data handling.

Have a question? Ask Techopedia here.

View all questions from Techopedia.

Techopedia Staff
Profile Picture of Techopedia Staff

At Techopedia, we aim to provide insight and inspiration to IT professionals, technology decision-makers and anyone else who is proud to be called a geek. From defining complex tech jargon in our dictionary, to exploring the latest trend in our articles or providing in-depth coverage of a topic in our tutorials, our goal is to help you better understand technology - and, we hope, make better decisions as a result. 

 Full Bio
Free Whitepaper: The Path to Hybrid Cloud
Free Whitepaper: The Path to Hybrid Cloud:
The Path to Hybrid Cloud: Intelligent Bursting To Amazon Web Services & Microsoft Azure
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.


  • E-mail is not a threat. (Postal mail) is universal. The Internet is not.
    - USPS spokesperson Susan Brennan, in a 2001 Wired article.