• gian @lemmy.grys.it
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    5
    ·
    1 day ago

    What a bad judge.

    Why ? Basically he simply stated that you can use whatever material you want to train your model as long as you ask the permission to use it (and presumably pay for it) to the author (or copytight holder)

    • patatahooligan@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 day ago

      “Fair use” is the exact opposite of what you’re saying here. It says that you don’t need to ask for any permission. The judge ruled that obtaining illegitimate copies was unlawful but use without the creators consent is perfectly fine.

    • LifeInMultipleChoice@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      3
      ·
      edit-2
      1 day ago

      If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

      They may be trying to put safeguards so it isn’t directly happening, but here is an example that the text is there word for word:

      • VoterFrog@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        23 hours ago

        If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

        A student can absolutely buy a text book and then teach the other students the information in it for free. That’s not redistribution. Redistribution would mean making copies of the book to hand out. That’s illegal for people and companies.

        • LifeInMultipleChoice@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          23 hours ago

          The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source. It is not alive, it has no thoughts. It has no “its own words.” (As seen by the judgement that its words cannot be copyrighted.) It only has other people’s words. Every word it spits out by definition is plagiarism, whether the work was copyrighted before or not.

          People wonder why works, such as journalism are getting worse. Well how could they ever get better if anything a journalist writes can be absorbed in real time, reworded and regurgitated without paying any dos to the original source. One journalist article, displayed in 30 versions, dividing the original works worth up into 30 portions. The original work now being worth 1/30th its original value. Maybe one can argue it is twice as good, so 1/15th.

          Long term it means all original creations… Are devalued and therefore not nearly worth pursuing. So we will only get shittier and shittier information. Every research project… Physics, Chemistry, Psychology, all technological advancements, slowly degraded as language models get better, and original sources deminish returns.

          • booly@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            5 hours ago

            just spitting the information back out, without paying the copyright source

            The court made its ruling under the factual assumption that it isn’t possible for a user to retrieve copyrighted text from that LLM, and explained that if a copyright holder does develop evidence that it is possible to get entire significant chunks of their copyrighted text out of that LLM, then they’d be able to sue then under those facts and that evidence.

            It relies heavily on the analogy to Google Books, which scans in entire copyrighted books to build the database, but where users of the service simply cannot retrieve more than a few snippets from any given book. That way, Google cannot be said to be redistributing entire books to its users without the publisher’s permission.

          • VoterFrog@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            20 hours ago

            The language model isn’t teaching anything it is changing the wording of something and spitting it back out. And in some cases, not changing the wording at all, just spitting the information back out, without paying the copyright source.

            You could honestly say the same about most “teaching” that a student without a real comprehension of the subject does for another student. But ultimately, that’s beside the point. Because changing the wording, structure, and presentation is all that is necessary to avoid copyright violation. You cannot copyright the information. Only a specific expression of it.

            There’s no special exception for AI here. That’s how copyright works for you, me, the student, and the AI. And if you’re hoping that copyright is going to save you from the outcomes you’re worried about, it won’t.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        3
        ·
        1 day ago

        That’s not at all what this ruling says, or what LLMs do.

        Copyright covers a specific concrete expression. It doesn’t cover the information that the expression conveys. So if I paint a portrait of myself, that portrait is covered by copyright. If someone looks at the portrait and says “this is a portrait of a tall, dark, handsome deer-creature of some sort with awesome antlers” they haven’t violated that copyright even if they’re accurately conveying the same information that the portrait is conveying.

        The ruling does cover the assumption that the LLM “contains” the training text, which was asserted by the Authors and was not contested by Anthropic. The judge ruled that even if this assertion is true it doesn’t matter. The LLM is sufficiently transformative to count as a new work.

        If you have an LLM reproduce a copyrighted text, the text is still copyrighted. That doesn’t change. Just like if a human re-wrote it word-for-word from memory.

      • gian @lemmy.grys.it
        link
        fedilink
        English
        arrow-up
        3
        ·
        1 day ago

        If I understand correctly they are ruling you can by a book once, and redistribute the information to as many people you want without consequences. Aka 1 student should be able to buy a textbook and redistribute it to all other students for free. (Yet the rules only work for companies apparently, as the students would still be committing a crime)

        Well, it would be interesting if this case would be used as precedence in a case invonving a single student that do the same thing. But you are right

        • fum@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          This was my understanding also, and why I think the judge is bad at their job.

          • LifeInMultipleChoice@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            1 day ago

            I suppose someone could develop an LLM that digests textbooks, and rewords the text and spits it back out. Then distribute it for free page for page. You can’t copy right the math problems I don’t think… so if the text wording is what gives it credence, that would have been changed.

              • LifeInMultipleChoice@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                1
                ·
                edit-2
                1 day ago

                Oh I agree it should be, but following the judges ruling, I don’t see how it could be. You trained an LLM on textbooks that were purchased, not pirated. And the LLM distributed the responses.

                (Unless you mean the human reworded them, then yeah, we aren’t special apparently)

                • WraithGear@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  ·
                  edit-2
                  1 day ago

                  Yes, on the second part. Just rearranging or replacing words in a text is not transformative, which is a requirement. There is an argument that the ‘AI’ are capable of doing transformative work, but the tokenizing and weight process is not magic and in my use of multiple LLM’s they do not have an understanding of the material any more then a dictionary understands the material printed on its pages.

                  An example was the wine glass problem. Art ‘AI’s were unable to display a wine glass filled to the top. No matter how it was prompted, or what style it aped, it would fail to do so and report back that the glass was full. But it could render a full glass of water. It didn’t understand what a full glass was, not even for the water. How was this possible? Well there was very little art of a full wine glass, because society has an unspoken rule that a full wine glass is the epitome of gluttony, and it is to be savored not drunk. Where as the reference of full glasses of water were abundant. It doesn’t know what full means, just that pictures of full glass of water are tied to phrases full, glass, and water.

                  • LifeInMultipleChoice@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    3
                    ·
                    edit-2
                    1 day ago

                    Yeah, we had a fun example a while ago, let me see if I can still find it.

                    We would ask to create a photo of a cat with no tail.

                    And then tell it there was indeed a tail, and ask it to draw an arrow to point to it.

                    It just points to where the tail most commonly is, or was said to be in a picture it was not referencing.

                    Edit: granted now, it shows a picture of a cat where you just can’t see the tail in the picture.

        • WraithGear@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          3
          ·
          1 day ago

          It can, the only thing stopping it is if it is specifically told not to, and this consideration is successfully checked for. It is completely capable of plagiarizing otherwise.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            1
            ·
            1 day ago

            For the purposes of this ruling it doesn’t actually matter. The Authors claimed that this was the case and the judge said “sure, for purposes of argument I’ll assume that this is indeed the case.” It didn’t change the outcome.

            • WraithGear@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              1 day ago

              I mean, they can assume fantasy, and it will hold weight because laws are interpreted by the court, not because the court is correct.

              • FaceDeer@fedia.io
                link
                fedilink
                arrow-up
                1
                ·
                1 day ago

                It made the ruling stronger, not weaker. The judge was accepting the most extreme claims that the Authors were making and still finding no copyright violation from training. Pushing back those claims won’t help their case, it’s already as strong as it’s ever going to get.

                As far as the judge was concerned, it didn’t matter whether the AI did or did not “memorize” its training data. He said it didn’t violate copyright either way.

                • VoterFrog@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  23 hours ago

                  Makes sense to me. Search indices tend to store large amounts of copyrighted material yet they don’t violate copyright. What matters is whether or not you’re redistributing illegal copies of the material.

    • j0ester@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      Huh? Didn’t Meta not use any permission, and pirated a lot of books to train their model?