ALERT

[FREE DEMO] Deploy Your Enterprise Cloud in Minutes

Content Scraping

Definition - What does Content Scraping mean?

Content scraping is an illegal way of stealing original content from a legitimate website and posting the stolen content to another site without the knowledge or permission of the content's owner. Content scrapers often attempt to pass off stolen content as their own, and fail to provide attribution to the content’s owners.

Content scraping can be accomplished via manual copy and paste, or may use more sophisticated techniques, such as using special software, HTTP programming or HTML or DOM parsers.

Much of the content that falls prey to scraping is copyrighted material; reposting it without the copyright owner’s permission is a punishable offense. However, scraper sites are hosted all over the world, and scrapers who are asked to remove copyrighted content may just switch the domains or disappear.

Techopedia explains Content Scraping

Content scrapers are able to drive traffic to their websites by scraping high-quality, keyword-dense content from other sites. Bloggers are particularly susceptible to this, probably because individual bloggers are unlikely to launch a legal attack against scrapers. Scrapers are encouraged to continue this practice because search engines have not yet found an effective way to filter out unique content from scraped content, allowing scrapers to continue to benefit.

Website administrators can protect themselves against scraping through simple measures, such as adding links to their own site within the content. This will at least allow them to get some traffic from scraped content. More sophisticated methods of dealing with scraping by bots include:

  • Commercial anti-bot applications
  • Catching bots with a honeypot and blocking their IP addresses
  • Blocking bots with JavaScript code

Techopedia Deals

Connect with us

Techopedia on Linkedin
Techopedia on Linkedin
Tweat cdn.techopedia.com
"Techopedia" on Twitter


'@Techopedia'
Sign up for Techopedia's Free Newsletter!

Email Newsletter

Join thousands of others with our weekly newsletter

Resources
The 4th Era of IT Infrastructure: Superconverged Systems
The 4th Era of IT Infrastructure: Superconverged Systems:
Learn the benefits and limitations of the 3 generations of IT infrastructure – siloed, converged and hyperconverged – and discover how the 4th...
Approaches and Benefits of Network Virtualization
Approaches and Benefits of Network Virtualization:
Businesses today aspire to achieve a software-defined datacenter (SDDC) to enhance business agility and reduce operational complexity. However, the...
Free E-Book: Public Cloud Guide
Free E-Book: Public Cloud Guide:
This white paper is for leaders of Operations, Engineering, or Infrastructure teams who are creating or executing an IT roadmap.
Free Tool: Virtual Health Monitor
Free Tool: Virtual Health Monitor:
Virtual Health Monitor is a free virtualization monitoring and reporting tool for VMware, Hyper-V, RHEV, and XenServer environments.
Free 30 Day Trial – Turbonomic
Free 30 Day Trial – Turbonomic:
Turbonomic delivers an autonomic platform where virtual and cloud environments self-manage in real-time to assure application performance.