What Does JobTracker Mean?
JobTracker is a daemon which runs on Apache Hadoop’s MapReduce engine. JobTracker is an essential service which farms out all MapReduce tasks to the different nodes in the cluster, ideally to those nodes which already contain the data, or at the very least are located in the same rack as nodes containing the data.
Techopedia Explains JobTracker
JobTracker is the service within Hadoop that is responsible for taking client requests. It assigns them to TaskTrackers on DataNodes where the data required is locally present. If that is not possible, JobTracker tries to assign the tasks to TaskTrackers within the same rack where the data is locally present. If for some reason this also fails, JobTracker assigns the task to a TaskTracker where a replica of the data exists. In Hadoop, data blocks are replicated across DataNodes to ensure redundancy, so that if one node in the cluster fails, the job does not fail as well.
JobTracker process:
- Job requests from client applications are received by the JobTracker,
- JobTracker consults the NameNode in order to determine the location of the required data.
- JobTracker locates TaskTracker nodes that contain the data or at least are near the data.
- The job is submitted to the selected TaskTracker.
- The TaskTracker performs its tasks while being closely monitored by JobTracker. If the job fails, JobTracker simply resubmits the job to another TaskTracker. However, JobTracker itself is a single point of failure, meaning if it fails the whole system goes down.
- JobTracker updates its status when the job completes.
- The client requester can now poll information from JobTracker.