What Does Job Chaining Mean?
Job chaining is a term in MapReduce that refers to launching several steps in the same MapReduce task. With job chaining, the first job sends output to one job, which sends output to the next job in the chain, and so on until the job is complete. It is a form of pipelining MapReduce jobs to make them more manageable.
Techopedia Explains Job Chaining
Job chaining in MapReduce refers to running multiple tasks in one single MapReduce job.
For example, a job chain might consist of:
Map1 > Reduce1 > Map2 > Reduce2
The advantage of job chaining is that it eliminates the need for intermediate data between all the steps in a pipeline. In that sense, job chaining is similar to input/output redirection in the Unix shell. Output from one link in the chain flows to the input in the next job in the chain. MapReduce allows developers to specify dependencies, or which jobs must be completed before it processes the next jobs in the chain through the use of the addDependingJob() method call.
This makes it easier for a developer to write a MapReduce program that can process large amounts of data.