Definition - What does Apache Pig mean?
Apache Pig is a platform that is used to analyze large data sets. It consists of a high-level language to express data analysis programs, along with the infrastructure to evaluate these programs. One of the most significant features of Pig is that its structure is responsive to significant parallelization.
Pig operates on the Hadoop platform, writing data to and reading data from the Hadoop Distributed File System (HDFS) and performing processing by means of one or more MapReduce jobs. Apache Pig is available as open source.
Apache Pig is also known as Pig Programming Language or Hadoop Pig.
Techopedia explains Apache Pig
Some of the key properties of Pig Latin are as follows:
- Easy to program: Intricate tasks consisting of various interconnected data transformations are clearly encoded as data flow sequences. This makes them simple to write, understand and maintain.
- Optimization possibilities: The manner in which the tasks are encoded allows the system to optimize automatic execution. This allows the user to pay attention to semantics instead of efficiency.
- Extensibility: Users are allowed to create their own functions for carrying out special-purpose processing. The Pig engine is responsible for the execution of data flow written in Pig Latin. Much like a standard relational database management system (RDBMS) design, Apache Pig consists of a parser, optimizer and type checker, in addition to operators that carry out data processing. Pig does not include transactions, a data catalog or the ability to directly handle data storage or employ the execution framework.
Join thousands of others with our weekly newsletter
Free Whitepaper: The Path to Hybrid Cloud:
Free E-Book: Public Cloud Guide:
Free Tool: Virtual Health Monitor:
Free 30 Day Trial – Turbonomic: