The AI giants are going to be writing some big checks. But they may never be big enough.
The image above “AI eating copyright” was generated on Jasper.ai.
(This is the opening essay from this week's edition of my Sunday Note, which is produced with Zach Peterson.)
THE DECODER PODCAST is one of the best places to go to understand AI. Host Nilay Patel, co-founder and EIC of The Verge, and a former lawyer, has always been one of the first people I turn to for perspective on the tech universe. A recent episode with The Verge features editor Sarah Jeong, also a former lawyer, is all about copyright law, the AI industry, and the ins and outs of the colossal confrontation brewing between people who create things and the companies creating the machines are learning from them.
The show starts off perfectly—basically, the lawyers think the new wave of copyright suits against AI companies could actually signal an extinction level event for AI, but the AI CEOs don’t. According to both Patel and Jeong, the CEOs are of the view that money will end up solving this problem. This may be true, but I think the idea that a few million dollars here and there will be the solution is completely misplaced.
Fair use is written right into the Copyright Act, and it says that certain kinds of copies are okay. Since the law can’t predict what everyone might want to do, it has a four-factor test written into it that courts can use to determine if a copy is fair use.But the legal system is not deterministic or predictable. Any court gets to run that test any way they want, and one court’s fair use determination isn’t actually precedent for the next court. That means fair use is a very vibes-based situation[…]
It’s a coin flip on a case-by-case basis. In December, The New York Times joined the fray when they filed suit against OpenAI and Microsoft for copyright infringement. The NYT claims in the suit that OpenAI’s AI training models are essentially stealing the NYT’s work, leading to billions of dollars in revenue for OpenAI and a net loss for NYT journalists and shareholders alike. This is essentially the first major case in this space (though some comedians and authors had sued OpenAI earlier), and there are untold billions of dollars at stake.
Here is a pdf of the full filing. We’d love to hear from any attorneys with some thoughts on the whole thing—feel free to comment or send an email!
Here’s an illustrative excerpt:
Publicly, Defendants [eds: OpenAI and Microsoft] insist that their conduct is protected as “fair use” because their unlicensed use of copyrighted content to train GenAI models serves a new “transformative” purpose. But there is nothing “transformative” about using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it. Because the outputs of Defendants’GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use.The law does not permit the kind of systematic and competitive infringement that Defendants have committed.
For comparison’s sake, here’s what the AI companies have been saying to date. An excerpt from public comments submitted to the U.S. Copyright Office:
Google:
If training could be accomplished without the creation of copies, there would be no copyright questions here. Indeed that act of “knowledge harvesting.” to use the Court’s metaphor from Harper & Row, like the act of reading a book ‘and learning the facts and ideas within it, would not only be non-infringing, it would further the very purpose of copyright law. The mere fact that, as a technological matter, copies need to be made to extract those ideas and facts from copyrighted works should not alter that result.
StabilityAI:
A range of jurisdictions including Singapore, Japan, the European Union, the Republic of Korea, Taiwan, Malaysia, and Israel have reformed their copyright laws to create safe harbors for Al training that achieve similar effects o fair use.” In the United Kingdom, the Government Chief Scientific Advisor has recommended that “if the government’s aim is to promote an innovative Al industry in the UK, it should enable mining of available data, text, and images (the input) and utilise [sic] existing protections of copyright and IP law on the output of AI.
Andreessen Horowitz:
Over the last decade or more, there has been an enormous amount of investment—billons and billions of dollars—in the development of AI technologies, premised on an understanding that, under current copyright law, any copying necessary to extract statistical facts is permitted. A change in this regime will significantly disrupt settled expectations in this area. Those expectations have been a critical factor in the enormous investment of private capital into U.S.-based AI companies which, in turn, has made the U.S. a global leader in AI. Undermining those expectations will jeopardize future investment, along with U.S. economic competitiveness and national security.
This won’t be a slam dunk case and the next major one like it won’t be either. The Supreme Court has yet to weigh in, of course—a very interesting discussion point in the Decoder episode.
The NYT is asking for “billions of dollars” in compensation, and I support them wholeheartedly, but I would take it one step further. The companies, unions and others filing similar suits should not just take the cash that will inevitably be on the table. The big media companies have done this dance with Big Tech’s next big thing before, and one hopes that they learned the appropriate lessons from the rise of the social media age. I’m sure many of the people in the upper echelons of the AI startup-verse arrived after some sort of tenure with one of the social media giants, so there should be some familiar faces.
For a decade, media companies rode the coattails of social media companies that, in the end, paid very little to kill off small and mid-sized media outlets. Sure, there were “pivots to video,” and then the “pivot to live video”, and then the pivot to algorithmic feeds that were trained on years of personal data willingly handed over by billions of people around the world. Given that data transfer and the amount of data it encompasses, it’s hard not to see that AI—powerful AI—was inevitable.
I hope that instead of taking a check for what at the time feels like an almost impossible amount of money, the media companies and creative artist community more widely get meaningful stakes in these companies and the opportunity to share in that financial success.
I always thought it was crazy that, during the steel company bankruptcies of the 1990s and airline bankruptcies in the decade or so that followed the 9/11 attacks, the U.S. government canceled debts, negotiated buyouts, and pumped cash into entire industries, but it never took permanent stakes in that recovery. Companies in the creative industries lost even more by not getting stakes in the social media companies that were slowly killing them, and they shouldn’t let it happen again.
There’s no reason the New York Times should settle for anything less than significant rolling payments from, and a meaningful equity stake in, OpenAI.
What’s more (and likely a topic for a later edition of this newsletter) unions have a chance here to have a real re-awakening in the U.S. A big reason there aren’t more suits like this is the lack of potential filers who can afford to pay the legal fees necessary to settle these matters in court. Collective action is the only thing that will get the journalists at the NYT, the writers of the next great Netflix series, and anyone else who puts that touch of creativity to things that only a human can, what they deserve—and that process should have started yesterday.
These world-beating AI models should be powerful enough to figure out how much of what it is using as learning inputs, maybe we should put it to use to make sure that the people—real, actual human beings—who write the first draft of history, document corruption, and…paint lovely landscapes, get the compensation they deserve for the consumption of their work, whether by human or machine.
Interestingly, there is a toggle opt-in to AI training in the Substack settings (the default setting is to allow this data transfer). The little info blurb tells you all you need to know, emphasis mine:
This setting indicates to AI tools like ChatGPT and Google Bard that their models should not be trained on your published content. This will only apply to AI tools which respect this setting, and blocking training may limit your publication's discoverability in tools and search engines that return AI-generated results.
It’s time to get ahead of AI regulation—all facets of it—before it’s too late.
Comments