Oct 23, 2024

Rights Holders vs. Generative AI: Latest Lawsuits and Licensing Developments

shapes on a dark background - rights holders vs gen ai lawsuits and licensing developments - valyu blog
shapes on a dark background - rights holders vs gen ai lawsuits and licensing developments - valyu blog

The rapid rise of generative AI technologies has led to a significant shift in the dynamics between content creators and AI companies, igniting lawsuits and prompting new licensing agreements. As AI companies like OpenAI, Google DeepMind, and Anthropic scale their models, they rely heavily on vast amounts of training data scraped from online content, raising legal questions about the boundaries of copyright law. Rights holders, including authors, artists, musicians, and publishers, are increasingly taking a stand, claiming infringement on their intellectual property. At the same time, innovative licensing deals are emerging to address these challenges, seeking a more equitable balance between creators and AI companies.

Active Generative AI Data Licensing Lawsuits 

As generative AI technologies continue to expand, they are increasingly facing legal scrutiny over the use of copyrighted material in training their models. Below are some of the key ongoing lawsuits and complaints that highlight the growing tension between rights holders and AI companies.

OpenAI vs. The Authors Guild

The Authors Guild and 17 prominent authors such as John Grisham, Jodi Picoult, David Baldacci, and George R.R. Martin filed a class-action suit against OpenAI in September 2023 for copyright infringement. The lawsuit pointed out the fact that the authors’ books were gathered from online repositories or by means of crawling, and used the copyrighted works without permission to train large language models like ChatGPT 3.5 and ChatGPT 4. The authors argue that OpenAI’s scraping of internet content violates copyright laws, as it involves the unauthorised reproduction of their work.

The Getty Images vs. Stability AI Case

The visual  media and stock images company Getty Images sued Stability AI in January 2023 for using millions of its copyrighted images to train its image generating stable diffusion models without permission. Getty alleges that Stability AI scraped millions of images from its vast library of high-quality photographs and illustrations, which are protected by copyright. These images were reportedly used as training data for Stability AI’s models without any form of licensing agreement. 

Stability AI has generally responded by pointing to the fair use doctrine, a defence commonly cited in copyright cases and have applied for the claims to be struck out pre-trial. However, the applications have been rejected by the UK’s high court, meaning the claims will proceed to full trial in the coming months. 

Sarah Silverman vs. OpenAI & Meta

Comedian and author Sarah Silverman as well as authors Christopher Golden and Richard Kadrey filed a class-action lawsuit against OpenAI and Meta in July 2023 at a California federal court, alleging that both companies used her books and other written content without authorization to train their models. Specifically, Silverman argues that her memoir “The Bedwetter” was among the works that were scrapped from the internet and fed into datasets used to train models like ChatGPT and LLaMA (Meta). This suit is similar to others in the sense that it challenges the boundaries of "fair use" under copyright law, particularly when works are scrapped for data without clear consent from rights holders. As of 2024, the case is still in early stages with both OpenAI and Meta have yet to file detailed responses. 

The New York Times vs. OpenAI & Meta

The New York Times (NYT) has filed a complaint against OpenAI and Microsoft for copyright infringement in late December 2023, alleging the two companies train their generative AI models using the publication’s articles. The NYT accuses OpenAI of copyright infringement by scraping millions of their articles from the internet and using them to train AI models  without a proper licence. As a news company, The NYT relies heavily on a subscription model for its revenue. From a business standpoint, if an AI tool like ChatGPT can summarise or paraphrase articles, users may no longer need to visit the NYT website, thus potentially undermining its financial model. 

Major Record Labels  vs. AI Music Companies (Suno & Udio)

AI music startups Suno and Udio are currently facing lawsuits from major record labels, including Universal, Sony, and Warner. The record companies claim that these startups utilised extensive catalogues of copyrighted music without authorisation to train their AI models, which are capable of generating new songs based on user prompts. According to the labels, this infringes on the intellectual property rights of artists and poses a risk to the music industry by flooding the market with AI-generated tracks, which could devalue and overshadow human-created music.

In a similar response with aforementioned AI companies facing copyright lawsuits, Suno and Udio argue that their use of copyrighted material falls under the doctrine of "fair use", contending that the lawsuits are an attempt by the labels to suppress innovation and stifle competition from AI-generated music. They assert that their AI models enhance the creative process rather than replacing it, acting as tools for artists and producers. The case highlights the critical questions about how current copyright law applies to the training of AI models and whether legal frameworks need to evolve to address the challenges posed by emerging AI technologies.

Scarlett Johansson’s Open AI dispute

Scarlett Johansson's dispute with OpenAI centres around the company's release of a voice assistant that sounded similar to her voice. In 2024, OpenAI introduced an enhanced AI assistant named "Sky," which quickly attracted attention for sounding strikingly similar to Johansson's portrayal of the AI character Samantha in the 2013 film Her. The controversy intensified when OpenAI’s CEO, Sam Altman, tweeted "her" after the demo, further fueling speculation that Johansson’s voice had been used as a model.

Johansson's frustration deepened due to prior interactions with OpenAI. Altman had personally reached out to her twice, once in September and again shortly before the demo, offering her the role to voice the assistant. After declining both offers, Johansson was shocked to find that Sky’s voice still resembled hers. In response, Johansson and her legal team demanded clarity from OpenAI regarding the development of Sky’s voice. While OpenAI temporarily suspended the voice feature, they denied intentionally modelling it after Johansson. Nonetheless, the situation sparked a wider debate about the use of celebrity likeness in AI-generated content and the legal boundaries surrounding it.

Latest Licensing Developments Between Generative AI and Rights Holders

There have been several significant developments in the licensing landscape between rights holders and generative AI companies in 2024. Key players like OpenAI and Apple have increasingly signed licensing deals with major media companies. This marks a shift toward more formalised relationships between publishers and AI companies as the demand for high-quality content to train AI models continues to grow.

Open AI and Condé Nast

OpenAI signed a multi-year partnership deal with Condé Nast, a prominent media company known for titles like The New Yorker, GQ,  and Vogue. This partnership allows OpenAI to use Condé Nast's text content in training its AI models. The terms suggest a formalisation of content use with compensation structures for the use of high-quality journalism and other media. The deal with Condé Nast is one of many where OpenAI has sought to establish formal relationships with publishers to avoid legal issues over the use of copyrighted materials.

Perplexity AI and Various Publishers in Publishers’ Program

Perplexity AI, a free AI search engine company, has recently launched the “Publisher Program” which allows publishers to earn a share of revenue generated when their text content is referenced in Perplexity’s AI-powered answers. Some of the initial partners include Time, Fortune, Der Spiegel, The Texas Tribune, and WordPress. The partnership also gives publishers access to Perplexity’s tools, APIs, and analytics, enabling them to track how their content is being used. Perplexity’s initiative is part of the company’s effort to build sustainable relationships with content creators while exploring how generative AI supports the news industry. 

Microsoft and Informa

Microsoft struck a major deal with Informa, the UK’s publishing and exhibition company, focusing on integrating AI to enhance both companies’ operations. The deal would allow Microsoft to gain access to Informa’s substantial data, particularly from its publishing arm Taylor & Francis, which includes academic and business research content. In return, Informa will benefit from Microsoft's AI technology to drive innovation, improve productivity, and expand its capabilities. The partnership is set to run until 2027, with Microsoft paying an initial $10 million data access fee, along with recurring payments over three years.

Apple and Major News Publishers

Apple is reportedly currently negotiating with news publishers like New York Times, Conde Nast, and IAC (People, the Daily Beast and Better Homes and Gardens), to access their archives for training its AI models. These deals are part of Apple's push to develop its own generative AI products, "Apple Intelligence." To train these models, Apple has been offering major publishers significant sums—up to millions of dollars—to secure the rights to use their extensive content libraries, such as articles and multimedia. This approach contrasts with the data scraping methods employed by competitors like Google and OpenAI, as Apple aims to secure more transparent and formal partnerships with content creators. The company emphasises ethical AI development and fostering direct relationships with publishers.

Conclusion

As generative AI continues to evolve, the landscape of rights holder relationships is becoming increasingly complex. Recent lawsuits, such as those involving authors against companies like OpenAI and Meto, underscore the legal tensions surrounding copyright and AI training practices​. Concurrently, various licensing agreements, such as those between OpenAI and major publishers like Condé Nast and Hearst, highlight a growing recognition of the need for fair compensation and formalised partnerships. Furthermore, initiatives like Perplexity AI's Publisher Program and Microsoft's collaboration with Informa illustrate the industry's shift towards transparent revenue-sharing models that prioritise the interests of content creators. 

—-

Cover Image by Google DeepMind from Pexels.

More to read

Subscribe to our newsletter!

Valyu is a data provenance and licensing platform that connects data providers with ML engineers looking for diverse, high-quality datasets for training models.  

#WeBuild 🛠️

Subscribe to our newsletter!

Valyu is a data provenance and licensing platform that connects data providers with ML engineers looking for diverse, high-quality datasets for training models.  

#WeBuild 🛠️

Subscribe to our newsletter!

Valyu is a data provenance and licensing platform that connects data providers with ML engineers looking for diverse, high-quality datasets for training models.  

#WeBuild 🛠️