MapReduce

Why Trust Techopedia

What Does MapReduce Mean?

MapReduce is a programming model introduced by Google for processing and generating large data sets on clusters of computers.

Advertisements

Google first formulated the framework for the purpose of serving Google’s Web page indexing, and the new framework replaced earlier indexing algorithms. Beginner developers find the MapReduce framework beneficial because library routines can be used to create parallel programs without any worries about infra-cluster communication, task monitoring or failure handling processes.

MapReduce runs on a large cluster of commodity machines and is highly scalable. It has several forms of implementation provided by multiple programming languages, like Java, C# and C++.

Techopedia Explains MapReduce

The MapReduce framework has two parts:

  1. A function called “Map,” which allows different points of the distributed cluster to distribute their work
  2. A function called “Reduce,” which is designed to reduce the final form of the clusters’ results into one output

The main advantage of the MapReduce framework is its fault tolerance, where periodic reports from each node in the cluster are expected when work is completed.

A task is transferred from one node to another. If the master node notices that a node has been silent for a longer interval than expected, the main node performs the reassignment process to the frozen/delayed task.

The MapReduce framework is inspired by the “Map” and “Reduce” functions used in functional programming. Computational processing occurs on data stored in a file system or within a database, which takes a set of input key values and produces a set of output key values.

Each day, numerous MapReduce programs and MapReduce jobs are executed on Google’s clusters. Programs are automatically parallelized and executed on a large cluster of commodity machines. The runtime system deals with partitioning the input data, scheduling the program’s execution across a set of machines, machine failure handling and managing required intermachine communication. Programmers without any experience with parallel and distributed systems can easily use the resources of a large distributed system.

MapReduce is used in distributed grep, distributed sort, Web link-graph reversal, Web access log stats, document clustering, machine learning and statistical machine translation.

Advertisements

Related Terms

Margaret Rouse
Technology Expert
Margaret Rouse
Technology Expert

Margaret is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical business audience. Over the past twenty years, her IT definitions have been published by Que in an encyclopedia of technology terms and cited in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine, and Discovery Magazine. She joined Techopedia in 2011. Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages.