How can I learn to use Hadoop to analyze big data?
The Apache software set known as Hadoop is becoming a very popular resource for dealing with big data sets. This type of data handling software framework was built in order to help aggregate data in specific ways, based on designs that may make some kinds of data projects more efficient. That said, Hadoop is only one of many tools for handling large data sets.
One of the first and most basic ways to learn about big data analysis with Hadoop is to understand some of the top-level components of Hadoop and what it does. These include a Hadoop YARN "resource management platform" that can be applied to certain kinds of network setups, as well as a Hadoop MapReduce set of functions that apply to big data sets. There’s also a Hadoop distributed file system (HDFS), which helps to store data across distributed systems so that it can be quickly and efficiently indexed or retrieved.
Beyond this, those who want to become more familiar with Hadoop can look at individual published resources for professionals who explain the software on a relatable level. This example from Chris Stucchio at a personal blog provides an excellent set of points about Hadoop and data scale. One of the basic takeaways is that Hadoop may be more commonly used than is necessary, and may not be the best solution for an individual project. Reviewing these kinds of resources will help professionals become more familiar with the details of using Hadoop in any given scenario. Stucchio also provides metaphors for relating Hadoop's functions to specific physical tasks. Here, the example is counting the number of books in a library, whereas a Hadoop function might break that library up into sections, providing individual counts that are blended into one aggregate data result.
A more in-depth way that professionals can learn more about Hadoop and its application to big data is through specific training resources and programs. For example, the online learning company Cloudera, a prominent provider of remote training sessions, has a number of interesting options around Hadoop use and similar types of data handling.
Being digital should be of more interest than being electronic.- Alan Turing, 1947