Breaking

Latest updates on technology trends and innovations • Stay informed with MetroWire

privacy

Privacy Alert: Report Exposes Tech Giants Using Private Data for AI Training

A new Al Jazeera report highlights how major tech platforms are quietly updating terms to use 'private' user data for training AI models, sparking fresh privacy concerns.

November 24, 2025 at 1:19 PM

Privacy Alert: Report Exposes Tech Giants Using Private Data for AI Training

Advertisement

A comprehensive new report released today by Al Jazeera has reignited the debate over data privacy in the age of artificial intelligence. The investigation details how several major technology companies have quietly updated their terms of service or privacy policies to allow the use of user-generated content—often assumed to be private—for training their large language models (LLMs) and other AI systems.

The report specifically flags recent policy shifts by platforms like LinkedIn (owned by Microsoft), Meta, and others. For instance, it highlights that while private messages on platforms like WhatsApp remain end-to-end encrypted, metadata and interaction patterns are increasingly fair game. More concerningly, content on platforms like LinkedIn—such as posts, articles, and even profile data—is now being used by default to train generative AI models unless users manually opt out. The report notes that finding these opt-out settings is often a labyrinthine process designed to discourage users from disabling data collection.

“The consent model is broken,” the report argues. “Users are clicking ‘agree’ to terms that fundamentally change the value exchange. It is no longer just about serving you ads; it is about harvesting your intellectual output to build products that may eventually replace you.” The investigation cites privacy experts who warn that this aggregation of personal data creates a “permanent digital memory” that can be queried and repurposed in ways users never intended.

Meta’s upcoming policy change, set for December, was also scrutinized. While the company claims it does not use private messages from Messenger or Instagram for training, it does utilize public posts, comments, and photos. The report clarifies that “public” often includes content shared with large groups of friends or followers, a distinction that many users fail to appreciate. Furthermore, once data is fed into a model, removing it is technically nearly impossible—a concept known as ‘machine unlearning’ which is still in its infancy.

For software engineers and data scientists, this report serves as a critical reminder of the ethical and legal complexities surrounding dataset curation. The ‘black box’ nature of commercial datasets is becoming a liability. As regulations like the EU’s AI Act come into force, companies will face increasing pressure to prove that their models were not trained on illicitly obtained or non-consensual data.

The report concludes with a call for a “Data Dividend” or stricter “opt-in by default” regulations, arguing that the current “opt-out” paradigms are insufficient to protect user agency. As AI becomes more hungry for data to overcome the ‘data wall’ (the diminishing supply of high-quality public text), the tension between user privacy and model performance is set to become the defining legal battle of the next decade.

Sponsored Content

Advertisement