Recent revelations from internal meetings at Meta, the parent company of Facebook and Instagram, have shed light on discussions among managers, lawyers, and engineers regarding the potential acquisition of Simon & Schuster to procure books for training the company’s artificial intelligence (AI) tools.
The recordings, shared with the New York Times by an employee of Meta, offer insights into deliberations on leveraging the renowned publishing house’s extensive catalog to enhance AI training, prompting ethical and legal considerations.
According to the recordings spanning March to April 2023, Meta personnel convened on a near-daily basis to explore avenues for acquiring additional data to train AI models. Discussions included the possibility of purchasing Simon & Schuster, with some participants contemplating paying $10 per book for licensing rights to new titles.
Simon & Schuster, a prominent player in the English-speaking publishing landscape and part of the esteemed “Big Five” alongside Penguin Random House, HarperCollins, Hachette, and Macmillan, boasts a roster of distinguished authors such as Stephen King, Colleen Hoover, and Bob Woodward.
The prospect of Meta acquiring Simon & Schuster arose following Paramount Global’s announcement in March 2020 of its intent to divest the publishing house. Despite an aborted merger attempt with Penguin Random House, Simon & Schuster was ultimately sold to private equity firm KKR in August 2023.
Ahmad Al-Dahle, Meta’s vice president of generative AI, reportedly informed executives that the company had exhausted nearly all available English-language literary content on the internet for AI training purposes, prompting the search for new data sources.
Employees acknowledged using text sources without permission and contemplated expanding these practices despite potential legal ramifications. Concerns raised by a lawyer regarding the ethical implications of using copyrighted intellectual property were met with silence.
Additionally, discussions revealed Meta’s employment of contractors in Africa to aggregate summaries of copyrighted fiction and non-fiction texts, raising further ethical and legal questions regarding data collection practices.
Maria A Pallante, president of the Association of American Publishers, expressed skepticism about Simon & Schuster’s willingness to entertain such a sale, questioning Meta’s intentions and its potential impact on authors and contractual agreements.
In a related development, California federal judge Vince Chhabria dismissed a portion of a copyright lawsuit filed by comedian Sarah Silverman and other authors against Meta over the use of copyrighted books in training its AI system LLaMA. Chhabria cast doubt on claims that the AI models’ outputs significantly resembled the authors’ works, underscoring ongoing debates surrounding AI and intellectual property rights.