Potential evidence may have been deleted by OpenAI

Last December, the precedent-setting lawsuit was launched by the New York Times, one of the world’s largest newspapers, against Microsoft and OpenAI because, according to the publisher, the companies violated copyright when the newspaper articles were used to train large language models.

According to documents filed in federal district court in Manhattan, OpenAI may have illegally used millions of articles to train models running ChatGPT and Microsoft Copilot services without the paper’s consent, prompting The Times to call on OpenAI to remove training materials derived from the infringing material. data, he also expressed his demand for unquantified compensation for the “several billions of dollars” in damages.

The initiation of the legal procedure was preceded by a negotiation process that lasted for many months, but in the end the parties did not reach an agreement on the basic issues. The lawsuit is still in the early stages as the two sides seek and provide evidence to begin negotiations. But that could take longer than planned, as according to a New York Times court document filed on Wednesday, OpenAI engineers may have “accidentally” deleted evidence that took weeks to track down and compile.

Machine recruiting: you may not like it

AI has become irreversibly integrated into the recruitment process.

Machine recruiting: you may not like it
AI has become irreversibly integrated into the recruitment process.

OpenAI needed to provide access to the training data, which it solved by creating a sandbox environment. NYT’s legal team then conducted more than 150 hours of research and data sifting for evidence of copyright infringement on two virtual machines before OpenAI allegedly carelessly deleted search data stored on one of the virtual machines.

The data recovery was partially successful, but the folder structure and file names were lost, along with a lot of usable evidence. According to OpenAI, it was a mistake, while a representative of the New York Times said they have no reason to believe that the incident was intentional. The work essentially has to be started from scratch.

The outcome of the lawsuit may have precedential value in the future, as it may determine the relationship between other publishers and the developers of press products and large language models. The main concern of content producers regarding the training process of artificial intelligence models is that operators “feed” copyrighted material to the AI ​​without permission.

The Times has already spent more than a million dollars fighting OpenAI in court, which few publishers can afford. Meanwhile, the ChatGPT developer has already signed several deals with major publishers such as Axel Springer, Conde Nast and The Verge’s parent company, Vox Media, suggesting that many publishers prefer partnerships over legal battles.

Source: www.hwsw.hu