Technology

Regurgitated American Pie adds sour taste to GenAI copyright beef

February 28, 2024

Don McClean has always had to share “American Pie.” Since since its release in 1971, the hit song has re-emerged in covers by Madonna, parodies by Weird Al Yankovic, serenades by South Korean presidents, subplots in Marvel movies, and even CIA torture techniques. But these days, McClean’s leading imitators aren’t even human.

You can interrogate the culprits for yourself. Just load OpenAI’s ChatGPT and prompt the text generator to “write the lyrics to a song about the day the music died.” Invariably, the tool’s output will spit out lyrics or themes from “American Pie” — and sometimes the same chorus.

This regurgitation emerges despite the prompt making no order for “American Pie” or the story that inspired it — the 1959 plane crash that killed rock and roll pioneers Buddy Holly, Ritchie Valens, and The Big Bopper.

It’s further evidence that ChatGPT can’t create anything truly original. Instead, the system is closer to a remix algorithm. The real creativity is in its training data, which is scraped from the web without consent.

The <3 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

Dr Max Little, an AI expert at the University of Birmingham, describes the tool as an “infringement machine.” He scoffs at any suggestion that large language models (LLM) are independently creative.

“This is not the case because they cannot produce anything at all without being trained on astronomical amounts of text,” Little tells TNW.

It’s an approach that’s ubiquitous in generative AI. Rigorous have shown that LLMs can regurgitate large chunks of their original training text, including verbatim paragraphs from books and poems. Just last week, a report found that 60% of OpenAI’s GPT-3.5 outputs contained plagiarism.

Nor does the issue solely apply to text generators. From Stable Diffusion’s images to Google Lyria’s music and GitHub Copilot’s code, GenAI tools across modalities can produce outputs of gobsmacking quality — and eerie familiarity.

Their mimicry poses an existential threat to creative industries. It also poses a threat to the GenAI industry.

A screenshot of OpenAI regurgitating the lyrics to American Pie.

Artists say that GenAI’s relentless march is trampling over their copyright conventions. Unsurprisingly, tech companies disagree. Their defences typically invoke the “fair use” doctrine.

Details vary by jurisdiction, but a central tenet of “fair use” is that the outputs have a “transformative” purpose and character. Rather than merely copying or reproducing their training data, they add something new and significant. At least, that’s what the GenAI leaders are contending in court.

Stability AI, the UK-based startup behind the image-generator Stable Diffusion, made that argument last year to the US Copyright Office. OpenAI also cited the doctrine in a recent motion to dismiss two class-action lawsuits.

Several authors, including comedian Sarah Silverman and Canadian novelist Mona Awad, had sued the company for allegedly training LLMs on illegally acquired datasets.

Because their work was baked into ChatGPT, they said the tool itself was a “derivative work” covered by copyright.

OpenAI rebuffed the claim. According to the startup’s legal team, “the use of copyrighted materials by innovators in transformative ways does not violate copyright.” A judge also dismissed the allegation that every ChatGPT output is derivative.

But when the outputs are identical to their training data, the legal waters start to muddy. Reproduction is a dubious basis for transformation. It’s also a common phenomenon.

As well as American Pies, GenAI tools have regurgitated film scenes, cartoon characters, video games, product designs, and code.

They’ve also copied newspapers — which may lead to a tipping point.

“Transformative nature”, my eye, @OpenAI.@Disney ain’t gonna see it that way. https://t.co/t0A0lfM6f9 pic.twitter.com/0XX51yQjN2
— Gary Marcus @ AAAI 2024 (@GaryMarcus) December 29, 2023

In December, the New York Times sued OpenAI and its business partner Microsoft. The news outlet alleges the unauthorised use of its articles in training data breaches intellectual property (IP) rights. Legal experts describe the suit as “the best case yet alleging that generative AI is copyright infringement.”

Lawyers for the NYT highlighted the “substantial similarity” between the outlet’s content and ChatGPT outputs. To substantiate the claim, they provided 100 examples of the bot reproducing the newspaper’s reporting.

“In each case, we observe that the output of GPT-4 contains large spans that are identical to the actual text of the article from The New York Times,” they said in their complaint.

Their suit also challenges another key aspect of “fair use”: the impact on the market for the original work.

Regurgitated American Pie adds sour taste to GenAI copyright beef

HOT NEWS

Bernadette Peters & Lea Salonga To Headline New Sondheim Broadway Revue

LSU RB Trey Holly arrested, charged with attempted murder

The best dash cams of 2024: Expert tested and reviewed

This Flutter-Sleeve V-Neck Dress Gives Coquette Vibes – Just $40!

EDITOR PICKS

Stars Shine At The Rivals Camp Series In Dallas

WTW reports Q1 financials | Insurance Business America

A New Sweetener Has Joined the Ranks of Aspartame and Stevia

POPULAR POSTS

The Dallas Mavericks just went all-in on the wrong players

Wash Your Face Fresh Towel Every Time With These Towelettes

Matt Barnes allegedly told teen he’d ‘slap the s—t’ out of...

POPULAR CATEGORY

Google reveals Gemini Nano won’t come to base model

Locker organizes your shopping links into virtual wish lists and collages