Definition - What does Apache Nutch mean?
Apache Nutch is a web crawler software product that can be used to aggregate data from the web. It is used in conjunction with other Apache tools, such as Hadoop, for data analysis.
Techopedia explains Apache Nutch
Apache Nutch is an open-source product licensed by the Apache Software Foundation. This developer community holds licenses for a range of Apache software tools that can sort and analyze data. One of the central technologies is Apache Hadoop, a big data analytics tool that is very popular in the business community.
Along with tools like Apache Hadoop and features for file storing, analysis and more, the role of Nutch is to collect and store data from the web through the use of web crawling algorithms.
Users can take advantage of simple commands in Apache Nutch to collect information under URLs. Users typically use Apache Nutch along with another open-source tool, a framework called Apache Solr, which can act as a repository for the data collected with Apache Nutch.
Breaking Silos: How to Consolidate, Cleanse and Use Your Data for Good
Join thousands of others with our weekly newsletter
The 4th Era of IT Infrastructure: Superconverged Systems:
Approaches and Benefits of Network Virtualization:
Free E-Book: Public Cloud Guide:
Free Tool: Virtual Health Monitor:
Free 30 Day Trial – Turbonomic: