Content Scraping

Why Trust Techopedia

What Does Content Scraping Mean?

Content scraping is an illegal way of stealing original content from a legitimate website and posting the stolen content to another site without the knowledge or permission of the content’s owner. Content scrapers often attempt to pass off stolen content as their own, and fail to provide attribution to the content’s owners.

Advertisements

Content scraping can be accomplished via manual copy and paste, or may use more sophisticated techniques, such as using special software, HTTP programming or HTML or DOM parsers.

Much of the content that falls prey to scraping is copyrighted material; reposting it without the copyright owner’s permission is a punishable offense. However, scraper sites are hosted all over the world, and scrapers who are asked to remove copyrighted content may just switch the domains or disappear.

Techopedia Explains Content Scraping

Content scrapers are able to drive traffic to their websites by scraping high-quality, keyword-dense content from other sites. Bloggers are particularly susceptible to this, probably because individual bloggers are unlikely to launch a legal attack against scrapers. Scrapers are encouraged to continue this practice because search engines have not yet found an effective way to filter out unique content from scraped content, allowing scrapers to continue to benefit.

Website administrators can protect themselves against scraping through simple measures, such as adding links to their own site within the content. This will at least allow them to get some traffic from scraped content. More sophisticated methods of dealing with scraping by bots include:

  • Commercial anti-bot applications
  • Catching bots with a honeypot and blocking their IP addresses
  • Blocking bots with JavaScript code
Advertisements

Related Terms

Margaret Rouse
Technology Expert
Margaret Rouse
Technology Expert

Margaret is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical business audience. Over the past twenty years, her IT definitions have been published by Que in an encyclopedia of technology terms and cited in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine, and Discovery Magazine. She joined Techopedia in 2011. Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages.