ALERT

[LAST CHANCE] Data Layer: Modern Business, Defined

JobTracker

Definition - What does JobTracker mean?

JobTracker is a daemon which runs on Apache Hadoop's MapReduce engine. JobTracker is an essential service which farms out all MapReduce tasks to the different nodes in the cluster, ideally to those nodes which already contain the data, or at the very least are located in the same rack as nodes containing the data.

Techopedia explains JobTracker

JobTracker is the service within Hadoop that is responsible for taking client requests. It assigns them to TaskTrackers on DataNodes where the data required is locally present. If that is not possible, JobTracker tries to assign the tasks to TaskTrackers within the same rack where the data is locally present. If for some reason this also fails, JobTracker assigns the task to a TaskTracker where a replica of the data exists. In Hadoop, data blocks are replicated across DataNodes to ensure redundancy, so that if one node in the cluster fails, the job does not fail as well.

JobTracker process:

  1. Job requests from client applications are received by the JobTracker,
  2. JobTracker consults the NameNode in order to determine the location of the required data.
  3. JobTracker locates TaskTracker nodes that contain the data or at least are near the data.
  4. The job is submitted to the selected TaskTracker.
  5. The TaskTracker performs its tasks while being closely monitored by JobTracker. If the job fails, JobTracker simply resubmits the job to another TaskTracker. However, JobTracker itself is a single point of failure, meaning if it fails the whole system goes down.
  6. JobTracker updates its status when the job completes.
  7. The client requester can now poll information from JobTracker.

Techopedia Deals

Connect with us

Techopedia on Linkedin
Techopedia on Linkedin
Tweat cdn.techopedia.com
"Techopedia" on Twitter


'@Techopedia'
Sign up for Techopedia's Free Newsletter!

Email Newsletter

Join thousands of others with our weekly newsletter

Resources
The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.