Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Pro@programming.dev · 2 days ago

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

gian @lemmy.grys.it · 1 day ago

What a bad judge.

Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

patatahooligan@lemmy.world · 1 day ago

“Fair use” is the exact opposite of what you’re saying here. It says that you don’t need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.

LifeInMultipleChoice@lemmy.world · edit-2 1 day ago

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

They may be trying to put safeguards so it isn’t directly happening, but here is an example that the text is there word for word:

VoterFrog@lemmy.world · 23 hours ago

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

A student can absolutely buy a text book and then teach the other students the information in it for free. That’s not redistribution. Redistribution would mean making copies of the book to hand out. That’s illegal for people and companies.

LifeInMultipleChoice@lemmy.world · edit-2 23 hours ago

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no “its own words.” (As seen by the judgement that its words cannot be copyrighted.) It only has other people’s words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

Long term it means all original creations… Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project… Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

booly@sh.itjust.works · 5 hours ago

just spitting the information back out, without paying the copyright source

The court made its ruling under the factual assumption that it isn’t possible for a user to retrieve copyrighted text from that LLM, and explained that if a copyright holder does develop evidence that it is possible to get entire significant chunks of their copyrighted text out of that LLM, then they’d be able to sue then under those facts and that evidence.

It relies heavily on the analogy to Google Books, which scans in entire copyrighted books to build the database, but where users of the service simply cannot retrieve more than a few snippets from any given book. That way, Google cannot be said to be redistributing entire books to its users without the publisher’s permission.

VoterFrog@lemmy.world · 20 hours ago

The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

You could honestly say the same about most “teaching” that a student without a real comprehension of the subject does for another student. But ultimately, that’s beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

There’s no special exception for AI here. That’s how copyright works for you, me, the student, and the AI. And if you’re hoping that copyright is going to save you from the outcomes you’re worried about, it won’t.

FaceDeer@fedia.io · 1 day ago

That’s not at all what this ruling says, or what LLMs do.

Copyright covers a specific concrete expression. It doesn’t cover the information that the expression conveys. So if I paint a portrait of myself, that portrait is covered by copyright. If someone looks at the portrait and says “this is a portrait of a tall, dark, handsome deer-creature of some sort with awesome antlers” they haven’t violated that copyright even if they’re accurately conveying the same information that the portrait is conveying.

The ruling does cover the assumption that the LLM “contains” the training text, which was asserted by the Authors and was not contested by Anthropic. The judge ruled that even if this assertion is true it doesn’t matter. The LLM is sufficiently transformative to count as a new work.

If you have an LLM reproduce a copyrighted text, the text is still copyrighted. That doesn’t change. Just like if a human re-wrote it word-for-word from memory.

LifeInMultipleChoice@lemmy.world · 22 hours ago

It’s a horrible ruling. If you want to see why I say so I put some of the reasonung in the other comment who responded to that.

https://lemmy.world/comment/17884664

gian @lemmy.grys.it · 1 day ago

If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right

fum@lemmy.world · 1 day ago

This was my understanding also, and why I think the judge is bad at their job.

LifeInMultipleChoice@lemmy.world · 1 day ago

I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can’t copy right the math problems I don’t think… so if the text wording is what gives it credence, that would have been changed.

WraithGear@lemmy.world · 1 day ago

If a human did that it’s still plagiarism.

LifeInMultipleChoice@lemmy.world · edit-2 1 day ago

Oh I agree it should be, but following the judges ruling, I don’t see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

(Unless you mean the human reworded them, then yeah, we aren’t special apparently)

WraithGear@lemmy.world · edit-2 1 day ago

Yes, on the second part. Just rearranging or replacing words in a text is not transformative, which is a requirement. There is an argument that the ‘AI’ are capable of doing transformative work, but the tokenizing and weight process is not magic and in my use of multiple LLM’s they do not have an understanding of the material any more then a dictionary understands the material printed on its pages.

An example was the wine glass problem. Art ‘AI’s were unable to display a wine glass filled to the top. No matter how it was prompted, or what style it aped, it would fail to do so and report back that the glass was full. But it could render a full glass of water. It didn’t understand what a full glass was, not even for the water. How was this possible? Well there was very little art of a full wine glass, because society has an unspoken rule that a full wine glass is the epitome of gluttony, and it is to be savored not drunk. Where as the reference of full glasses of water were abundant. It doesn’t know what full means, just that pictures of full glass of water are tied to phrases full, glass, and water.

LifeInMultipleChoice@lemmy.world · edit-2 1 day ago

Yeah, we had a fun example a while ago, let me see if I can still find it.

We would ask to create a photo of a cat with no tail.

And then tell it there was indeed a tail, and ask it to draw an arrow to point to it.

It just points to where the tail most commonly is, or was said to be in a picture it was not referencing.

Edit: granted now, it shows a picture of a cat where you just can’t see the tail in the picture.

FreedomAdvocate@lemmy.net.au · 1 day ago

Not at all true. AI doesn’t just reproduce content it was trained on on demand.

WraithGear@lemmy.world · 1 day ago

It can, the only thing stopping it is if it is specifically told not to, and this consideration is successfully checked for. It is completely capable of plagiarizing otherwise.

FaceDeer@fedia.io · 1 day ago

For the purposes of this ruling it doesn’t actually matter. The Authors claimed that this was the case and the judge said “sure, for purposes of argument I’ll assume that this is indeed the case.” It didn’t change the outcome.

WraithGear@lemmy.world · 1 day ago

I mean, they can assume fantasy, and it will hold weight because laws are interpreted by the court, not because the court is correct.

FaceDeer@fedia.io · 1 day ago

It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won’t help their case, it’s already as strong as it’s ever going to get.

As far as the judge was concerned, it didn’t matter whether the AI did or did not “memorize” its training data. He said it didn’t violate copyright either way.

VoterFrog@lemmy.world · 23 hours ago

Makes sense to me. Search indices tend to store large amounts of copyrighted material yet they don’t violate copyright. What matters is whether or not you’re redistributing illegal copies of the material.

j0ester@lemmy.world · 1 day ago

Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?

gian @lemmy.grys.it · 1 day ago

True. And I will be happy if someone sue them and the judge say the same thing.

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Judge Rules Training AI on Authors' Books Is Legal But Pirating Them Is Not

Claude AI maker Anthropic bags key “fair use” win for AI platforms, but faces trial over damages for millions of pirated works – ai fray