[WEBINAR] Application Acceleration: Faster Performance for End Users

Apache Nutch

Definition - What does Apache Nutch mean?

Apache Nutch is a web crawler software product that can be used to aggregate data from the web. It is used in conjunction with other Apache tools, such as Hadoop, for data analysis.

Techopedia explains Apache Nutch

Apache Nutch is an open-source product licensed by the Apache Software Foundation. This developer community holds licenses for a range of Apache software tools that can sort and analyze data. One of the central technologies is Apache Hadoop, a big data analytics tool that is very popular in the business community.

Along with tools like Apache Hadoop and features for file storing, analysis and more, the role of Nutch is to collect and store data from the web through the use of web crawling algorithms.

Users can take advantage of simple commands in Apache Nutch to collect information under URLs. Users typically use Apache Nutch along with another open-source tool, a framework called Apache Solr, which can act as a repository for the data collected with Apache Nutch.

Techopedia Deals

Connect with us

Techopedia on Linkedin
Techopedia on Linkedin
"Techopedia" on Twitter

Sign up for Techopedia's Free Newsletter!

Email Newsletter

Join thousands of others with our weekly newsletter

Free Whitepaper: The Path to Hybrid Cloud
Free Whitepaper: The Path to Hybrid Cloud:
The Path to Hybrid Cloud: Intelligent Bursting To Amazon Web Services & Microsoft Azure
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.