[TODAY] Enabling the Mobile Workforce

Apache Pig

Definition - What does Apache Pig mean?

Apache Pig is a platform that is used to analyze large data sets. It consists of a high-level language to express data analysis programs, along with the infrastructure to evaluate these programs. One of the most significant features of Pig is that its structure is responsive to significant parallelization.

Pig operates on the Hadoop platform, writing data to and reading data from the Hadoop Distributed File System (HDFS) and performing processing by means of one or more MapReduce jobs. Apache Pig is available as open source.

Apache Pig is also known as Pig Programming Language or Hadoop Pig.

Techopedia explains Apache Pig

Apache Pig has two parts: Pig Latin language and Pig engine. The Pig Latin language is a scripting language that allows users to illustrate the way in which data flow from one or more inputs must be read and processed, and the location in which must be stored.

Some of the key properties of Pig Latin are as follows:

  • Easy to program: Intricate tasks consisting of various interconnected data transformations are clearly encoded as data flow sequences. This makes them simple to write, understand and maintain.
  • Optimization possibilities: The manner in which the tasks are encoded allows the system to optimize automatic execution. This allows the user to pay attention to semantics instead of efficiency.
  • Extensibility: Users are allowed to create their own functions for carrying out special-purpose processing. The Pig engine is responsible for the execution of data flow written in Pig Latin. Much like a standard relational database management system (RDBMS) design, Apache Pig consists of a parser, optimizer and type checker, in addition to operators that carry out data processing. Pig does not include transactions, a data catalog or the ability to directly handle data storage or employ the execution framework.
Share this:

Connect with us

Email Newsletter

Join thousands of others with our weekly newsletter

The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.