Definition - What does Apache Pig mean?
Apache Pig is a platform that is used to analyze large data sets. It consists of a high-level language to express data analysis programs, along with the infrastructure to evaluate these programs. One of the most significant features of Pig is that its structure is responsive to significant parallelization.
Pig operates on the Hadoop platform, writing data to and reading data from the Hadoop Distributed File System (HDFS) and performing processing by means of one or more MapReduce jobs. Apache Pig is available as open source.
Apache Pig is also known as Pig Programming Language or Hadoop Pig.
Techopedia explains Apache Pig
Apache Pig has two parts: Pig Latin language and Pig engine. The Pig Latin language is a scripting language that allows users to illustrate the way in which data flow from one or more inputs must be read and processed, and the location in which must be stored.
Some of the key properties of Pig Latin are as follows:
- Easy to program: Intricate tasks consisting of various interconnected data transformations are clearly encoded as data flow sequences. This makes them simple to write, understand and maintain.
- Optimization possibilities: The manner in which the tasks are encoded allows the system to optimize automatic execution. This allows the user to pay attention to semantics instead of efficiency.
- Extensibility: Users are allowed to create their own functions for carrying out special-purpose processing. The Pig engine is responsible for the execution of data flow written in Pig Latin. Much like a standard relational database management system (RDBMS) design, Apache Pig consists of a parser, optimizer and type checker, in addition to operators that carry out data processing. Pig does not include transactions, a data catalog or the ability to directly handle data storage or employ the execution framework.