Tech moves fast! Stay ahead of the curve with Techopedia!
Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.
In the context of the Internet, a spider is a specialized software designed to systematically crawl and browse the World Wide Web usually for the purpose of indexing Web pages in order to provide them as search results for user search queries. The most famous of such spiders is the Googlebot, Google's main crawler, which helps to ensure that relevant results are returned for search queries.
Spiders are also known as Web crawlers, search bots or simply bots.
A spider is essentially a program used to harvest information from the World Wide Web. It crawls through the pages of websites extracting information and indexing it for later use, usually for search engine results. The spider visits websites and their pages through the various links to and from the pages, so a page without a single link going to it will be difficult to index and may be ranked really low on the search results page. And if there are a lot of links pointing to a page, this would mean that the page is popular and it would appear higher up on the search results.
Steps involved in Web crawling:
Spiders or webcrawlers are just programs and, as such, they follow systematic rules set by the programmers. Owners of websites can also get in on this by telling the spider which portions of the site to index and which should not. This is done by creating a "robots.txt" file that contains instructions for the spider regarding which portions to index and links to follow and which ones it should ignore. The most significant spiders out there are those owned by major search engines such as Google, Bing and Yahoo, and those meant for data mining and research, but there are also some malicious spiders written to find and collect emails for the user to sell to advertisement companies or to find vulnerabilities in Web security.