The Internet Archive Hack: Why is the Web’s Digital Library Under Constant Attacks?

Why Trust Techopedia

The Internet Archive has spent the last few weeks being breached, hacked, DDoS attacked, and facing lawsuits preventing it from distributing certain “banned” books.

The American non-profit organization, which preserves and lends out digital and physical books and offers snapshots of websites from the past through the Wayback Machine, seems to be under a relentless onslaught.

Techopedia sits with experts to understand the recent attacks against the Internet Archive and its role in the AI and misinformation era.

Key Takeaways

  • The Internet Archive, a non-profit digital library, faces frequent attacks, such as DDoS, data breaches, and lawsuits, due to its role in preserving information and providing free access.
  • This month’s hack saw personal data stolen from 31 million Internet Archive users.
  • Techopedia speaks to experts to highlight the Internet Archive’s critical role in preserving historical data and combating misinformation through the Wayback Machine.
  • Copyright lawsuits, AI scraping, and attacks against the Internet Archive represent a larger trend against free access to information.
  • Protecting digital libraries like the Internet Archive is crucial to ensure factual content is preserved and accessible for future generations.

DDoS Hit, Site Defaced, and Data From 31 Million Users Stolen

On October 10, Brewster Kahle, the founder of the Internet Archive, confirmed via X/Twitter that DDoS attacks against the organization had resumed.

The attackers flooded the site with malicious traffic and managed to momentarily shut down the website.

By October 15, the Internet Archive was back online and fully operational. However, the organization then suffered a major breach. A threat actor stole personal data from 31 million users.

Advertisements

The compromised data includes email addresses, screen names, password change timestamps, Bcrypt-hashed passwords, and other internal data from the Internet Archive and the Wayback Machine.

The hacker left a message on the archive.org site which read:

“Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!”

The acronym ‘HIPB’ refers to the site Have I Been Pwned, which lets people search to see if they have been victims of a data breach. Techopedia interviewed the founder earlier this year.

One of the Internet Archive physical books storage deposits. Image: The Internet Archive.
One of the Internet Archive physical books storage deposits. (The Internet Archive)

Speaking to Bleeping Computer, Hunt confirmed that the threat actor had shared the data with HIPB, which included 31 million unique email addresses. Other security researchers also verified the data and confirmed it was legitimately extracted from the Internet Archive.

David Redekop, CEO at ADAMnetworks, a Zero-Trust security ecosystem, spoke to Techopedia about the breach and the work that the Internet Archive does.

“(The attack) It is simply senseless. An analogy of it being like a terrorist group attacking a library.”

We asked Redekop what his opinion was on the work of preservation (of books, information, and content) that the Internet Archive and the Wayback Machine do. Redekop said:

“This reminds me of the famous Oppenheimer quote, ‘History teaches man that man learns nothing from history’ — which happens when we don’t know history. For humanity to flourish, we need to preserve history.”

“Without historical data, it is impossible to determine the truth.”

Court Rules Against Access to Information as Misinformation and AI Become the Norm

The recent DDoS attack against the Internet Archive is but one roadblock in a long difficult road.

The organization, which has more than 916 billion web pages saved over time, 44 million real and digitalized books, and 10.6 million videos of films and television programs, recently lost a case for distributing digital books without publishers’ consent.

As a library, the Internet Archive acts as a gateway to access information, but the U.S. Court did not see things that way, prioritizing paywalls and profit over digital libraries.

On the internet side of things, the role of the Wayback Machine has become more relevant than ever today. According to Pew Research, 25% of all websites created between 2013 and 2023 have disappeared. The Wayback Machine is the last line of defense against the ever-changing online information.

The Wayback Machine of the Internet Archive saves snapshots of websites and online content that is constantly being removed and changed. (Internet Archive)
The Wayback Machine of the Internet Archive saves snapshots of websites and online content that is constantly being removed and changed. (The Internet Archive)

Kapil Raina, Data Security Evangelist at Bedrock Security, a cloud, generative AI, and data security company, told Techopedia that with modern concerns about legitimate knowledge and “truth,” it’s even more critical to preserve historical data and information published on the internet.

“Otherwise, it becomes easier to face the concern of revised narratives of historical events and information in our digital world.”

The Internet Archive has also been praised by experts for its ability to help combat disinformation, keeping the historical record straight with its Wayback Machine. Raina from Bedrock Security explained it also helps balance AI bias:

“With the advent of OpenAI and related companies, these sites can be helpful in balancing out knowledge and bias as they arise on current sites

“And, if we look further into the future, the current generation is more reliant on internet information — and thus, having historical information is critical to adjusting bias.

“Currently, with the rise of disinformation, especially by some political leaders, and social media tech companies being slow or refusing to address it, the timing of the recent attacks (against the Internet Archive) seems to align with the upcoming elections.”

Raina said that we need to support these services and equip them with the tools to protect their data as much as possible to minimize future impacts from such attacks.

A Library Fighting Off AI Bots

The rise of artificial intelligence has been problematic for an organization like the Internet Archive, to say the least. The organization has been calling for regulations while fending off bots that scrape content to train generative AI products. Additionally, as generative AI picks up speed, the online content created and compiled by humans is rapidly being rewritten by genAI tools.

John Price, CEO at SubRosa, a cybersecurity company, told Techopedia that the Wayback Machine plays a vital role in combating disinformation by providing snapshots of websites over time, creating a reliable record of what was published.

“This is a powerful tool for journalists, researchers, and the public to verify facts and track the evolution of information.”

Price added that the Internet Archive’s work in preserving books and other digital content is invaluable.

“As a digital public library, it ensures that historical, cultural, and educational materials are protected for future generations, particularly as digital content can quickly disappear,” Price said.

“The Internet Archive’s ongoing challenges—copyright cases, DDoS attacks, and AI scraping — indicate a growing movement against digital preservation efforts.”

Price explained that the movement stems from industries protecting proprietary data and fears around AI rewriting historical information and warned about the future.

“As AI and automation evolve, protecting archives will be essential to ensure factual content is preserved for the future.”

Chris Dukich, Founder & CEO at Display NOW, a digital transformation company, told Techopedia that the attacks and actions against the Internet Archive represent a bigger trend against free access to information.

“Considering the copyright lawsuits faced by the Internet Archive, the DDoS, and the AI scraping activities, one understands this as something larger than just free access to information.

“With the development of AI tools which clearly require a lot of data, we witness efforts to limit the availability of exactly that data which must be accessible if one wants to maintain transparency in the digital society.”

The Bottom Line

Whether the attacks against the Internet Archive are coordinated or not, they represent a movement to erase the historical archive and prioritize paid and AI generic content over free and public access to information created by humans.

Like the Alexandria Library of our modern times, the Internet Archive holds tremendous knowledge. Let’s just hope no one ‘accidentally burns’ it down little by little.

FAQs

What is the Internet Archive?

What happened in the recent Internet Archive hack?

Why is the Internet Archive under attack?

How does the Wayback Machine help combat misinformation?

Why is AI scraping a problem for the Internet Archive?

What can be done to protect digital archives like the Internet Archive?

Advertisements

Related Reading

Related Terms

Advertisements
Ray Fernandez
Senior Technology Journalist
Ray Fernandez
Senior Technology Journalist

Ray is an independent journalist with 15 years of experience, focusing on the intersection of technology with various aspects of life and society. He joined Techopedia in 2023 after publishing in numerous media, including Microsoft, TechRepublic, Moonlock, Hackermoon, VentureBeat, Entrepreneur, and ServerWatch. He holds a degree in Journalism from Oxford Distance Learning and two specializations from FUNIBER in Environmental Science and Oceanography. When Ray is not working, you can find him making music, playing sports, and traveling with his wife and three kids.