(Görsel kredisi: Future)

Bu hafta: Mustafa Süleyman’ın The Coming Wave: AI, Power and the 21st Century’s Greatest Dilemma kitabını dinliyordum . DeepMind’ın kurucu ortağının AI hakkındaki düşüncelerini ve çoktan başladığını öne sürdüğü “teknolojik devrim”i anlattığı kitap.

Üretken bir AI sistemi bir resim veya bir metin oluşturduğunda, her şey eğitimle başlar. Kelimelerin istatistiksel olarak birbirleriyle nasıl ilişkili olduğuna dair bir anlayış olmadan veya bir resmin ne gösterdiğine dair bir bilgi olmadan, üretken bir AI bunu başarılı bir şekilde yeniden oluşturamaz. Bir AI tarafından oluşturulan resim, milyonlarca insana ait gerçek eserlerden (milyonlarca eser) etkilenmiş olsa da, kendi başına yeni bir çalışma, tamamen orijinal bir çalışma olabilir.

Yapay zeka şirketlerinin veya yapay zeka sistemleri tarafından kullanılan veri kümelerini oluşturan firmaların veri toplamaya nasıl devam ettiği çok fazla tartışmanın kaynağı—yapay zekanın üstel büyümesinin üzerinde asılı duran rahatsız edici bir gerçek. Birçok yapay zeka firması sessizce web’den veriyi özgürce kullanmalarına izin verildiği gibi davranma pozisyonunu üstlendi—görüntüler, videolar veya metinler olsun. Bu gerekçe olmadan, kullandıkları içerik için gerçekten ödeme yapmak zorunda kalacaklardı ve bu da söz konusu büyümeyi tehdit ediyordu. Bu arada, sanatçılar, içerik yaratıcıları, gazeteciler, blog yazarları, yapımcılar, romancılar, kodlayıcılar, geliştiriciler, müzisyenler ve daha birçok kişi bunun tamamen saçmalık olduğunu savunuyor.

Bu ayrışmayı en iyi şekilde , Microsoft AI CEO’su Mustafa Süleyman’ın Aspen Fikir Festivali’nde CNBC’ye verdiği röportajda ( The Verge aracılığıyla ) yaptığı yorumlar örneklendiriyor .

Suleyman is at the centre of AI development today. Not only is he leading Microsoft’s AI efforts, he co-founded DeepMind, which was later bought by Google, and drove Google’s AI efforts, too. He’s had a large part to play in how two of the largest tech firms on the planet deliver their AI systems. I’ve been listening to the audiobook of Suleyman’s book this past few weeks, The Coming Wave, as he’s someone informed and with a lot to say about how AI has and will impact our daily lives.

So, I say this with the utmost respect to a pioneer in his field: I believe his idea of a “social contract” for the internet is complete nonsense.

Suleyman, when asked by CNBC’s Andrew Ross Sorkin on whether AI companies have “effectively stolen the world’s IP”, had this to say:

With respect to content that is already on the open web, the social contract of that content since the ’90s has been that it is fair use.

Mustafa Suleyman, Microsoft

“It’s a very fair argument. With respect to content that is already on the open web, the social contract of that content since the ’90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That’s been the understanding.”

Except that isn’t the understanding. At least not mine, anyways, and if you’ve been taking content freely from anywhere on the internet this whole time, I have some very bad news for you.

If we ignore the fact that freeware is already a thing and, no, not everything on the internet is freeware—just think of the ramifications for a moment if it were so, especially for Suleyman’s own employer, Microsoft—there’s further legalese to prevent a free-for-all online.

There’s something called copyright, which here in the UK was enshrined into law through the Copyright, Designs and Patents Act 1988. As a journalist, I have to be very conscious of the right I have to use anything on the internet, otherwise I may (rightly) be forced to pay a very large sum of money to the copyright holder.

Let’s not get too into the weeds with this (he says, not even halfway through a 2,000 word column), but generally copyright law covers “original literary, dramatic, musical, or artistic works.” That includes all manner of text, too, not just novels or short stories, and lasts usually 70 years. The rights to which are initially assigned to the “first owner” or creator of that work.

Copyright is automatically applied, meaning someone need not register to get it, but only applies to original works.

Some argue the creations of generative AI are original works, and therefore qualify for automatic copyright. To whom you grant the automatic copyright is a tricky situation, as when animals have taken photos of themselves (search ‘monkey selfie’) our very human laws don’t quite know what to make of it. We actually ended up with a ruling in 2014 by the United States Copyright office that states works by non-humans are not copyrightable (PDF). That’s despite a human playing a pivotal role in setting up the entire thing—which could have implications for AI-generated art, and not the least bit because that same ruling applies similar constraints on works created by a computer.

Yeşil bir zemin üzerinde kabaca çizilmiş bir maymun ve üst köşesinde 'Şeytan' yazısı.

Here’s a monkey. It was meant to be taking a selfie but hey ho—Andy Edser, 2024. (Image credit: Future)

Whether you own the copyright to the art you prompted through a generative AI system, even finessing those prompts to get it just right, is an ongoing debate. However, US courts currently rule against granting copyright in these instances, and have even barred award winning artwork from copyright.

But this is a tangent. Let’s focus back on the use of copyrighted works for training purposes because clearly copyright has something to do with the mass collection and use of images, videos, and text, without permission, for an AI system likely run by a private business for commercial gain.

Within UK law, the copyright owner (automatically the author or creator, or employer of said author or creator) gets to say who can use its images and how. It’s easy to waive your rights to images—I might see you’ve posted an image of a fun PC mod and message to ask if I have your permission to use it on PC Gamer, for example. If you say yes, providing I give you sufficient attribution for your work, everyone is happy and life moves on.

If I don’t ask your permission and subsequently take the image or “substantial” part of it (which some do, no doubt about that), upon finding out that I’ve encroached on your copyright, you could demand I remove the offending material, sue for damages, or even get an injunction banning me from publishing or repeating an offence again.

This has been the case since the act was introduced in the UK in 1988—which I’d add was before the internet was a big deal. Similar protections also exist around the world, including the US and EU.

So there’s really no excuse for saying we’ve all been living in some kind of wild west where anything goes on the internet. It doesn’t, AI companies just want that to be the case, and they are fighting to protect their own interests.

There are a few defences for taking copyrighted works without permission in UK law. These mostly come under something called fair dealing. Fair use in the US is a similar concept but different in practice and applicability—as a UK national, it’s mostly fair dealing that covers my actions. There are a few versions of fair dealing: one covers reporting of current events , another for review or criticism, and quotations and parody are also covered. Unless AI is actually a big joke, that last one won’t offer much of a defence.

PC Gamer logosunun kaba çizimi.

I can’t be sued for copyright by my own company! Wait, can I?—Jacob Ridley, 2024 (Image credit: Future)

Neither will the rest. They don’t cover photographs, for one, which are proactively defended in the law. They also require a user to not take unfair commercial advantage of the copyright owner’s works and only using what’s necessary for the defined purpose. They also frequently require sufficient acknowledgement—none of which is the done thing in generative AI.

The rights of some publishers to not share their content is something that Suleyman tends to agree is the case, and which has already been exploited, as he explained to CNBC (which, by the way, I can quote thanks to fair dealing):

That’s a grey area and I think that’s going to work its way through the courts.

Mustafa Suleyman, Microsoft

“There’s a separate category where a website or a publisher or a news organisation had explicitly said do not scrape or crawl me for any other reason than indexing me so other people can find that content. That’s a grey area and I think that’s going to work its way through the courts.”

“So far, some people have taken that information. I don’t know who, who hasn’t, but that’s going to get litigated and I think that’s rightly so.”

Except that the one form of content that doesn’t generally come under copyright law are actually news articles.

I’m frustrated by the moves from Google and Microsoft to use AI to summarise my articles into little regurgitated bites that threaten to destroy the business of the internet, but I wouldn’t want to argue that’s copyright infringement in court. It’s known as “lifting” a story when you take key information from something published by another and republish it yourself. Providing you don’t use the same words and layout—you don’t take the piss, basically—it’s legally fine to do under existing law.

Plenty of publishers will argue against AI systems on the finer points of these systems and what constitutes lifting and what’s just taking without asking and without fair recompense—see the New York Times vs. Open AI case. I’ll leave that to the lawyers. My argument is that, legal or not, an AI summarising stories with no kickback for the people working to create those stories will ultimately do a lot more harm than good in the long run.

Sağ altta 'ooo, sanat' yazısıyla kabaca çizilmiş yapay zeka.

You can have this one for free, AI—Jacob Ridley, 2024 (Image credit: Future)

Simply put, I don’t understand the argument from Suleyman here. Maybe it’s a degree of wishful thinking from someone inside the AI inner circle looking out, or maybe he’s looking around the internet and seeing some sort of wild west without any rules? But that’s not the case, even considering the common exceptions to copyright law we’ll get to in a moment.

Copyright infringement happens all the time on the web, and it’s a debasement of both our rights as creators to not have our stuff nicked and the value of the content itself. Does that mean we should just lay down, admit defeat, and let an AI system or dataset crawler rewrite the rules so that copyright need not apply to them? I don’t think so.

There are some measures coming into place to try to defend copyright in a world obsessed by AI. The EU has introduced the Artificial intelligence (AI) Act which includes a transparency requirement for “publishing summaries of copyrighted data used in training” and rules on compliance with EU copyright law, much of which is similar to that of the UK.

Though the EU also includes some get-outs allowing for data mining of copyrighted works in some instances. One allows the use in research and by cultural heritage institutions, and the other means users can opt out of further use by other organisations, including for machine learning. How exactly one opts out is, uh, not entirely clear (PDF).

The UK has something similar in place, as an exception to the 1988 Act, which allows for non-commercial use of data mining. This is generally not considered a viable defence for large AI firms with public, and commercial, generative AI systems. The UK Government had also planned another exception, since the sudden popularity of AI systems, though that has since fallen through. That’s probably to the benefit of people in the UK, who are technically safe from data mining for commercial purposes, but not for the AI firms hoping to scrape data from within the UK’s borders.

Şirketlerin bu sınırlamaları aşmayı umdukları kesin yollar veya bu yasaların pratikte nasıl göründüğü, avukatların, memurların ve politikacıların önümüzdeki yıllarda tartışmak zorunda kalacakları konulardır. Ancak genel olarak, bu tartışmaların telif hakkı yasası nedeniyle var olduğunu açıkça belirtmek istiyorum; yasanın eksikliğinden değil.

Kabaca çizilmiş bir veri madencisi.

Kendini konuşturuyor—Jacob Ridley, 2024. (Görsel kredisi: Future)

Bu kuralların kendileri için geçerli olmadığını varsayarak ve hükümetlere AI’nın teslim etmeyi vaat ettiği önemli miktardaki para nedeniyle AI’ya izin vermeleri için baskı yaparak, AI firmaları bugüne kadar büyük ölçüde bundan sıyrıldılar. Ben çoğunlukla “affetmek izin istemekten daha kolaydır” stratejisiyle hareket ettiklerini ve birkaç yıldır böyle olduklarını düşünüyorum. Bunu yapmaya devam edebilirler de. Telif hakkı iddialarıyla ve AI için var olup olmadıklarıyla başa çıktığımızda, telif hakkıyla korunan içerikle ağzına kadar dolu veri kümelerinde eğitilmiş AI sistemlerinin eğitimini geri almak mümkün olacak mı ? Aman Tanrım, bunu pek iyi yapamayız .

“Ne yazık” diyebilir yapay zeka yöneticisi.

İnternet için içerik üreten ve telif hakkı avukatı olma iddiasında bulunmayan biri olarak, veri madenciliği amacıyla herhangi bir boşluk yaratarak büyük AI şirketleri için bir kural ve sizin ve benim gibi sıradan insanlar için başka bir kuralla sonuçlanabileceğimizi düşünüyorum. AI’nın varsayılan faydaları, insan varoluşu için çok değerli görülen kendi yaratıcı çalışmanızın sıkı çalışmasıyla eğitilmiş sanat eseri, küçük telif hakkı ihlalleri tarafından geri tutulamaz. Öyle hissedebilir veya milyarlarca dolar değerindeki AI şirketlerini, faydalandıkları telif hakkıyla korunan içerikten sorumlu tutabiliriz.

Telif hakkı sahipleri yapay zekayı savuşturmayı başaramazlarsa, bildiğimiz şekliyle internet veya “açık web” ne olacak? Bir sanatçı çevrimiçi olarak herhangi bir şey yayınlamak isteyecek mi? Sosyal medya platformları ‘yapay zekaya dayanıklı’ olma vaadiyle ortaya çıkacak mı? İnternet bunun sonucunda daha fazla silo haline gelecek, ana yollardan uzakta ve Google, Microsoft ve veri kümesi şirketleri tarafından gönderilen tarayıcıların meraklı gözlerinden uzakta daha küçük topluluklara bölünecek mi?

Çünkü sonuçta, bir yapay zekanın sevgiyle bakabileceği tek şey bir makaledeki sözlerim veya hatta birinin kamuya açık sanat eseri değil, belki de düğün fotoğraflarınız veya müstehcen hayran kurgularınızdır. Peki ya o yapay zeka tarafından üretilen, sizin gibi görünen veya sizin gibi duyulan, sizin katılmadığınız bir şey için reklam, onu kaldırabilme yeteneğiniz veya benzerliğinizin eğitilmemiş olması ? Şimdi, bu , Suleyman’ın tüm bu zaman boyunca olduğuna inandığı kargaşa ve serbestliğe çok daha fazla benziyor. 

