marry 02.09.2024, 13:30 PM
A number of prominent news outlets and social networks such as the New York Times, Wired and Instagram have refused Apple’s training of AI models with their data for three months after Apple unveiled a tool for publishers that allows them to opt out of AI training.
For AI systems to work, they would have to be trained using copied web content. The troubling part of this is that the material used to train the AI is copyrighted, since it was created by humans.
There have already been cases where tech giants, including Apple, have been caught using, for example, YouTube content to train their AI models without permission.
Less than three months ago, Apple allowed publishers to opt out of the so-called scraping, and many websites use it to opt out of Apple Intelligence training. Among them are the websites Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media and many others.
Apple’s new tool, Applebot-Extended, is an add-on for Apple’s web crawler bot that allows website owners to tell Apple not to use their data to train artificial intelligence. Apple calls this “controlling data usage” in a blog post explaining how it works. The original Applebot, introduced in 2015, initially searched the web for Siri and Spotlight. Recently, however, Applebot’s mission has expanded – the data it collects can also be used to train Apple’s AI models.
Such training practices have already been called into question because of the way the AI works, taking in copyrighted content. It has been noted that the tool often provides information that has hardly changed.
For this reason, Apple allows publishers to be excluded from training but to retain search indexing that shows published content in Siri and Spotlight searches.
Photo: Kartikey Das | Pexels
Source: www.informacija.rs