Sarah Silverman Headlines Lawsuit Accusing OpenAI and Meta of Being ‘Industrial-Strength Plagiarists’
Comedian Sarah Silverman and other writers are suing OpenAI and Meta, claiming they illegally used copyrighted material to train their generative AI models ChatGPT and LLaMA. Silverman and authors Christopher Golden and Richard Kadrey claim the companies violated intellectual property law to build their respective large language models.
The lawsuit against Meta and OpenAI is centered on the information used to train their LLMs. The complaint alleges that books by the authors were among many copyrighted works in the training datasets. Meta and OpenAI did not credit or pay them and did not have permission from the copyright holders to use their work for that purpose. This would violate unfair competition laws and the Digital Millennium Copyright Act, according to the filing. Silverman, Golden, and Kadrey are represented in the lawsuit by Joseph Saveri and Matthew Butterick at the Joseph Saveri Law Firm. The lawyers have a previous class action suit against OpenAI on behalf of authors Paul Tremblay and Mona Awadand have even set up a website designed for non-lawyers to understand the cases as they see it.
We’ve filed lawsuits challenging ChatGPT and LLaMA, industrial-strength plagiarists that violate the rights of book authors,” Saveri and Butterick wrote in their introduction to the website. “Because AI needs to be fair & ethical for everyone.”
As evidence, the lawsuit points to how ChatGPT can summarize the books written by the authors, and its supposed training on BookCorpus, ta training dataset that included copyrighted material. The lawyers also found the plaintiffs’ works among the book pirating websites scraped for ‘ThePile,’ one of the datasets Meta has acknowledged it used to train LLaMA. The lawsuit asks for jury trials and wants the court to issue injunctions that could require OpenAI and Meta to make major changes to ChatGPT and LLaMA.
“Since the release of OpenAI’s ChatGPT system in March 2023, we’ve been hearing from writers, authors, and publishers who are concerned about its uncanny ability to generate text similar to that found in copyrighted textual materials, including thousands of books,” Saveri and Butterick wrote. “[W]e’ve filed a class-action lawsuit against OpenAI challenging ChatGPT and its underlying large language models, GPT-3.5 and GPT-4, which remix the copyrighted works of thousands of book authors—and many others—without consent, compensation, or credit.
The debate over how intellectual property rules apply to synthetic media and generative AI has exploded this year. Lawsuits and attempts to avoid them are a consistent aspect of any new LLM or synthetic media tool. Getty Images has a suit against Stability AI over whether its text-to-image model, Stable Diffusion, breaks those rules. And the Saveri Law Firm popped up earlier this year representing a group of artists in a class action lawsuit against Stability AI, along with Stable Diffusion platforms Midjourney and DeviantArt with the same general complaint. In both cases, the issues arise from the copyrighted images amongst the billions of pictures used to train Stable Diffusion. That includes the open-source LAION-5B dataset AI model and the images Stability scraped from the web, including Getty’s servers, without their creators’ awareness.
Companies looking to skip the courtroom drama are circumscribing their training and sometimes backing it up with their checkbook. Both Shutterstock and Adobe have independently said that if a client’s use of their generative AI tools leads to accusations of copyright violation, they will take up the legal costs. The point is that they are confident that their respective synthetic media generators don’t violate any IP rules.