Amnesty’s Warning: Generative AI May Be Built on Mass Privacy Violations

Written by Maria-Diandra Opre | Jul 2, 2026 11:19:06 AM

The mass extraction of data from text, images, videos, posts, artwork, code, research, and the ordinary fragments of human expression scattered across the web that AI has made to train itself could be considered privacy violations, Amnesty International says.

In Unlawful by Design, a critique of the generative AI industry’s data practices, the global human rights organization contends that “companies across the world are supplying generative AI products under the veneer of efficiency and sophistication, but in reality, these systems perpetuate mass invasions of privacy through unlawful web scraping: an automated process for extracting data from websites, including personal data, such as images and social media activity, to train AI models.”

While “training data” sounds neutral and technical “extraction” is a reminder that something was taken, processed, and converted into commercial value. Amnesty’s critique claims that generative AI was built through the reclassification of the internet itself, in which communication, memory, creativity, identity, and public participation were recast as raw material.

The False Equivalence Between Visibility and Consent

AI companies have scraped vast amounts of publicly available online material to train their models, often without asking the people who created it, appeared in it, or were described by it. Amnesty says these systems rely on “extracting information from billions of public online posts and images often without the explicit consent of the individuals appearing in or creating them.”

The industry’s implied defense has long been that public data is fair game. Yet that argument weakens the closer one looks. A person who posts a family photo, writes a political comment, uploads a portfolio, shares a health story, or publishes a personal essay has made something visible within a context. They have not necessarily agreed to feed a commercial machine capable of reproducing, analyzing, imitating, classifying, or monetizing patterns drawn from that material.mVisibility, access and scale don’t constitute consent, permission, and legitimacy.

Why the Damage Starts in the Data Pipeline

Amnesty is not only saying that generative AI can produce biased, harmful, or misleading outputs. It is argued that the problem begins much earlier, inside the pipeline that makes the technology possible. If the data used to train these models was collected through non-consensual mass scraping, then the ethical problem is not a product glitch. It is part of the product’s foundation.

That is why the phrase “unlawful by design” lands so sharply. It suggests that the issue cannot be fixed simply through better disclaimers, friendlier chat interfaces, or post-launch safety patches. A model trained on invasive data practices carries those practices into its architecture. The harm is upstream.

Amnesty International’s Likhita Banerji, head of Algorithmic Accountability Lab and Deputy Director at Amnesty Tech, describes the industry as operating “under the veneer of efficiency and sophistication,” while relying on systems that “perpetuate mass invasions of privacy through unlawful web scraping.” The phrase cuts through the mythology surrounding AI. For all its futuristic branding, much of the industry still depends on a very old business habit: take first, justify later.

Human Expression as an Industrial Input as Scale Launders Bias

A decade ago, a blog post, image caption, comment thread, or public profile might have seemed economically marginal on its own. In aggregate, these fragments became the substance from which multibillion-dollar systems were built. The internet became a mine, and human expression became the ore.

Amnesty’s briefing further warns that “the extractive data pipeline, inherent design choices made by tech companies, and exploitative supply chains, to build generative AI systems have enabled a paradigm of technology development that opens up a risk of mass abuse of human rights.”

The people whose data powered that transformation were rarely invited into the bargain. There was no meaningful negotiation, no clear consent process, no practical ability to understand how personal material might be absorbed, weighted, transformed, and reused. In such an economy, human beings supply the cultural, social, and intellectual material, while companies convert it into proprietary systems.

The same logic applies to bias. AI companies often frame discrimination as a technical problem to be solved through better tuning and guardrails. And when models are trained on the web at scale, they inherit the web’s hierarchies, stereotypes, exclusions, and cruelties. Racist, sexist, and culturally distorted patterns do not appear out of nowhere. They are learned from data environments already shaped by unequal power.

The larger the dataset, the stronger the temptation to pretend it represents the world.

In reality, the open web isn’t the entire world. It’s a chaotic collection of who could access it, who appeared, who was recorded, who was ridiculed, who was misrepresented, and who vanished. Relying on this archive as an unbiased source of knowledge risks perpetuating old biases through new technology.

Against the Myth of Technological Inevitability

Generative AI appears weightless to users, who only see a text box, not the data centers, chips, energy contracts, cooling systems, water use, mineral supply chains, or land demands behind it. The cloud is a metaphor, but AI challenges that idea as models grow and need more resources. This increases demand for energy-intensive chips and large data centers, causing resistance from communities already facing water and electricity shortages. Amnesty highlights opposition in Chile, Mexico, and Arizona, where data centers raise questions about who bears the environmental cost of digital convenience.

The industry presents generative AI as a tool that could help solve climate challenges, improve healthcare, democratize knowledge, and accelerate discovery. Some of that may prove true. Yet the same industry is expanding through methods that raise serious concerns about privacy, labor, water, energy, and local environmental stress. The promise is global, but the costs are often local.

The most useful part of Amnesty’s critique is that it rejects inevitability. AI companies often speak as though the current development path is the only possible one: more data, more compute, more scale, more deployment. But technology is never inevitable in that way. It is shaped by incentives, regulation, investment, legal interpretation, and public pressure.

Banerji argues that these choices are not inevitable. This comment is perhaps the most relevant in the critique, as it shifts the discussion from awe to accountability, emphasizing not just what AI can do but the type of technological economy we are prepared to accept.

The AI Ownership Question

A different AI future would begin by treating data rights as real rights, rather than an inconvenience to be engineered around. It would distinguish between public visibility and commercial permission. It would require companies to explain what data they use, how they obtain it, who is affected, and what remedies exist when rights are violated. It would also force a more honest accounting of the environmental burden created by endless model scaling.

None of this means rejecting AI outright but rather refusing the idea that innovation requires a blank check.

The generative AI industry has benefited from a remarkable asymmetry. Its products are presented as collective progress, while its foundations are enclosed as private advantage. It learns from the public internet, but the resulting systems are owned by corporations. It draws from human creativity, but often gives little back to the people whose work, likeness, language, and lives made the systems possible.

Maybe Amnesty is right. If words, images, choices, jokes, arguments, faces, labor, and memories helped build the machine, why were those that generated them never asked?

View full post