The New York Times Sues OpenAI and Microsoft for Training Generative AI Models With Copyrighted Work
The New York Times has filed a lawsuit against OpenAI and Microsoft, accusing them of copyright infringement for using millions of Times articles without permission to train their generative AI models. Court documents filed this week in Manhattan’s Federal District Court contend that OpenAI and Microsoft copied substantial portions of the Times’ archive of articles without permission to develop the large language models powering products like ChatGPT and Copilot.
The newspaper claims that OpenAI and Microsoft’s acts constitute willful copyright infringement under federal law. The Times is seeking an injunction that would require Microsoft and OpenAI to stop using its content, along with monetary compensation potentially in the billions of dollars based on the widespread use of the companies’ AI services. That use, according to the complaint, harmed the Times by allowing the AI systems to provide Times content for free.
“[OpenAI and Microsoft] seek to free-ride on the Times’s massive investment in its journalism by using it to build substitutive products without permission or payment,” the lawsuit states. “If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill. Less journalism will be produced, and the cost to society will be enormous.”
The core issue is whether web scraping done by generative AI companies to obtain training data from news sites and other sources qualifies as fair use under copyright law. Tech firms argue it does, while publishers contend their content cannot legally be copied at scale without permission or payment. Hundreds of media organizations, including The Times, have implemented technical measures to block their sites from being scraped without consent.
The Times alleges OpenAI and Microsoft copied millions of articles dating back to the 1980s before those measures were implemented as a way to train LLMs and writing tools like that directly compete against its journalism. In addition, the hallucinations that the models produce can include made-up quotes and facts attributed to The New York Times, which might damage the factual integrity of its reporting. The lawsuit cites several such instances from Microsoft’s Bing generative AI chatbot.
“We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models. Our ongoing conversations with The New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development,” an OpenAI spokesperson said in a statement. “We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.”
This isn’t the first lawsuit OpenAI has faced over where it obtained generative AI training data. That has prompted the partnerships the company referenced in its statement. For instance, the Associated Press and OpenAI have struck a deal to allow OpenAI access to the AP’s archive of text, images, and video for training ChatGPT. OpenAI has also just announced a deal with Berlin-based publishing giant Axel Springer to integrate articles from its publications into ChatGPT, providing answers to users and a data pipeline for training OpenAI’s LLMs.