OpenAI in Talks With Dozens of Publishers to License Content

OpenAI said it’s talking to dozens of publishers about striking deals to license their articles, a broader effort than was previously known as the startup looks for content to train its artificial intelligence models.

“We are in the middle of many negotiations and discussions with many publishers. They are active. They are very positive. They’re progressing well,” Tom Rubin, OpenAI’s chief of intellectual property and content, told Bloomberg News. “You’ve seen deals announced, and there will be more in the future.”

OpenAI recently inked a multiyear licensing deal with Politico’s parent company Axel Springer SE for tens of millions of dollars, a person familiar with the matter previously told Bloomberg. In July, OpenAI announced an agreement with The Associated Press for an undisclosed amount. These deals are key to OpenAI’s future as it’s balancing the need for updated, accurate data to build its models with growing scrutiny about where that data is sourced from.

But last week, one of the companies it had been in talks with, The New York Times Co., sued OpenAI and Microsoft Corp. for using the publication’s articles without permission.

The suit poses an existential challenge to OpenAI’s business. If the Times wins the case, OpenAI may not only owe billions of dollars, but could also be forced to destroy any of its training data that includes work from the Times, a costly and complicated task. More immediately, however, the lawsuit complicates OpenAI’s deal-making efforts with the media industry.

“The current situation is vastly different than the situations that the publishers faced in the past with search engines and social media,” Rubin said. “Here, the content is used for training a model. It’s not used to reproduce the content. It’s not used to replace the content.”

The Times, however, disagrees with OpenAI’s stance, arguing that ChatGPT is flat out copying its journalists’ work without paying for it. In its lawsuit, the publisher showed examples in which ChatGPT spit out entire paragraphs of nearly verbatim text from The New York Times (although some have pointed out that in certain examples, it was specifically prompting ChatGPT to reproduce Times content). The publisher argues that’s proof OpenAI used New York Times data.

“If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission,” The New York Times said in a statement. “They have not done so.”

Photo: Photographer: David Paul Morris/Bloomberg