Google MapReduce

Definition - What does Google MapReduce mean?

MapReduce is a programming model introduced by Google to generate and process large data sets. Users can specify map functions that process a key/value pair to generate intermediate key/value pairs, and reduce functions that merge intermediate values associated with the same intermediate key.

Techopedia explains Google MapReduce

MapReduce is a patented software framework from Google that is designed to support distributed computing on large data sets or computer clusters. The computational processing occurs on stored data in a file system or a database. The MapReduce libraries are coded in Java, C++, Python, Perl and other programming languages.

In the map step, the master node takes the input, partitions it to smaller sub-problems and distributes these to worker nodes. These nodes partition the input into a multilevel tree structure. Then worker nodes process the smaller problems and pass them back to the master node.

In the reduce step, a master node takes the answers to all sub-problems and combines them to get the answer to the problems it was originally trying to solve. MapReduce allows for the distributed processing of map and reduction operations. Mapping is performed in parallel, although this is limited by the data source. The master stores the worker machine's state and identity for every map and reduce task.

Popular White Papers

Techopedia Newsletter Sign-Up

Get Techopedia delivered to your inbox!