[WEBINAR] The New Normal: Dealing with the Reality of an Unsecure World


Definition - What does MapReduce mean?

MapReduce is a programming model introduced by Google for processing and generating large data sets on clusters of computers.

Google first formulated the framework for the purpose of serving Google’s Web page indexing, and the new framework replaced earlier indexing algorithms. Beginner developers find the MapReduce framework beneficial because library routines can be used to create parallel programs without any worries about infra-cluster communication, task monitoring or failure handling processes.

MapReduce runs on a large cluster of commodity machines and is highly scalable. It has several forms of implementation provided by multiple programming languages, like Java, C# and C++.

Techopedia explains MapReduce

The MapReduce framework has two parts:

  1. A function called "Map," which allows different points of the distributed cluster to distribute their work
  2. A function called "Reduce," which is designed to reduce the final form of the clusters’ results into one output

The main advantage of the MapReduce framework is its fault tolerance, where periodic reports from each node in the cluster are expected when work is completed.

A task is transferred from one node to another. If the master node notices that a node has been silent for a longer interval than expected, the main node performs the reassignment process to the frozen/delayed task.

The MapReduce framework is inspired by the "Map" and "Reduce" functions used in functional programming. Computational processing occurs on data stored in a file system or within a database, which takes a set of input key values and produces a set of output key values.

Each day, numerous MapReduce programs and MapReduce jobs are executed on Google's clusters. Programs are automatically parallelized and executed on a large cluster of commodity machines. The runtime system deals with partitioning the input data, scheduling the program's execution across a set of machines, machine failure handling and managing required intermachine communication. Programmers without any experience with parallel and distributed systems can easily use the resources of a large distributed system.

MapReduce is used in distributed grep, distributed sort, Web link-graph reversal, Web access log stats, document clustering, machine learning and statistical machine translation.

Techopedia Deals

Connect with us

Techopedia on Linkedin
Techopedia on Linkedin
"Techopedia" on Twitter

Sign up for Techopedia's Free Newsletter!

Email Newsletter

Join thousands of others with our weekly newsletter

Free Whitepaper: The Path to Hybrid Cloud
Free Whitepaper: The Path to Hybrid Cloud:
The Path to Hybrid Cloud: Intelligent Bursting To Amazon Web Services & Microsoft Azure
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.