• acastcandream@beehaw.org
    link
    fedilink
    arrow-up
    47
    ·
    1 year ago

    It’s a good first step, but we also need to address the way they gather the material to train these LLM‘s. That’s the core issue here that spans multiple industries. It’s stealing work in a way that functionally launders it, and then they want to claim it as original work while replacing the very people they’re pulling from. It’s a multi-variable issue here when really digging into the ethics.

    • mcgravier@kbin.social
      link
      fedilink
      arrow-up
      25
      ·
      1 year ago

      It’s stealing work in a way that functionally launders it

      Actually in the age of basically permanent copyrigt, this brings at least some balance

      • acastcandream@beehaw.org
        link
        fedilink
        arrow-up
        17
        ·
        1 year ago

        You would have an argument if it wasn’t the exact same corporations protecting themselves with copyright abuse that are mostly benefiting from this new system. 

          • acastcandream@beehaw.org
            link
            fedilink
            English
            arrow-up
            9
            ·
            edit-2
            1 year ago

            I know pretty much anyone can technically run them. But come on man. You do not have the same resources to bring to bear that you are at least indirectly competing with. It doesn’t even come close. For instance, your usage has no bearing on how it is deployed in Hollywood.

            • Even_Adder@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              8
              ·
              edit-2
              1 year ago

              I’m not sure what you mean. FOSS generative image models are already better than the corpo paid for ones, and it isn’t even close. They’re more flexible and have way more features and tools than what you can get out of a discord bot or cloud computing subscription.

              • acastcandream@beehaw.org
                link
                fedilink
                English
                arrow-up
                6
                ·
                edit-2
                1 year ago

                You keep narrowing the scope of the discussion.

                The combined corporate investment and interest in LLM‘s massively dwarfs what individuals are currently capable of doing. Yes, individuals can participate. Yes, sometimes it is better results. But to act like we have primary ownership of the situation, which is what you are heavily implying, is kind of ridiculous.  The playing field simply isn’t level and, if things don’t change, we will be the ones left holding the bag when it comes to social, cultural, and financial cost.

                • Even_Adder@lemmy.dbzer0.com
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  1 year ago

                  Thanks for sharing your thoughts. I think I can see why your side, but correct me if I misunderstood.

                  I don’t think we need primary ownership here. This site doesn’t have primary ownership in the social media market, yet it benefits users. Having our own spaces and tools is always something worth fighting for. Implying we need “primary ownership” is a straw man and emotional language like “massively”, “ridiculous”, and “left holding the bag” can harm this conversation. This is a false dilemma between two extremes: either individuals have primary ownership, or we have no control.

                  You also downplay the work of the vibrant community of researchers, developers, activists, and artists who are working on FOSS software and models for anyone to use. It isn’t individuals merely participating, it’s a worldwide network working for the public, often times leading research and development, for free.

                  One thing I’m certain of is that no that one can put a lid on this. What we can do is make it available, effective, and affordable to the public. Mega-corps will have their own models, no matter the cost. Just like the web, personal computers, and smartphones were made by big corporations or governments, we were the ones who turned them into something that enables social mobility, creativity, communication, and collaboration. It got to the point they tried jumping on our trends.

        • Jaded@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          4
          ·
          1 year ago

          The corporations already have all the data, users literally gave it to them by uploading it. Open source only has scrapped data. If you start regulating, you kill open source but the big players will literally just shrug it off.

          Traditional artists already lost. It sucks but now we get to find out if the winner is all of society or only just Adobe and Shutterstock.

    • ninjan@lemmy.mildgrim.com
      link
      fedilink
      arrow-up
      8
      ·
      1 year ago

      Yes, absolutely. They want AI to be people such that copyright applies and such that they can claim the AI was inspired just like a human artist is by the art they’re exposed to.

      We need a license model such that AI is only allowed to be trained on content were the license explicitly permits it and that no mention is equal to it being disallowed.

      • donuts@kbin.social
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        We need a license model such that AI is only allowed to be trained on content were the license explicitly permits it and that no mention is equal to it being disallowed.

        That is the default model behind copyright, which basically says that the only things people can use your copyrighted work for without a license are those which are determined to be “fair use”.

        I don’t see any way in which today’s AI ought to be considered fair use of other people’s writings, artwork, etc.

    • flatbield@beehaw.org
      link
      fedilink
      arrow-up
      6
      ·
      edit-2
      1 year ago

      I feel two ways about it. Absolutely it is recorded in a retrieval system and doing some sort of complicated lookup. So derivative.

      On the other hand, the whole idea of copyright or other so called IP except maybe trademarks and trade dress in the most limited way is perverse and should not exist. Not that I have a better idea.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        Absolutely it is recorded in a retrieval system and doing some sort of complicated lookup.

        It is not.

        Stable Diffusion’s model was trained using the LAION-5B dataset, which describes five billion images. I have the resulting AI model on my hard drive right now, I use it with a local AI image generator. It’s about 5 GB in size. So unless StabilityAI has come up with a compression algorithm that’s able to fit an entire image into a single byte, there is no way that it’s possible for this process to be “doing some sort of complicated lookup” of the training data.

        What’s actually happening is that the model is being taught high-level concepts through repeatedly showing it examples of those concepts.

        • flatbield@beehaw.org
          link
          fedilink
          arrow-up
          4
          ·
          edit-2
          1 year ago

          I would disagree. It is just a big table lookup of sorts with some complicated interpolation/extrapolation algorithm. Training is recording the data into the net. Anything that comes out is derivative of the data that went in.

          • FaceDeer@kbin.social
            link
            fedilink
            arrow-up
            2
            ·
            1 year ago

            You think it’s “recording” five billion images into five billion bytes of space? On what basis do you think that? There have been efforts by researchers to pull copies of the training data back out of neural nets like these and only in the rarest of cases where an image has been badly overfitted have they been able to get something approximately like the original. The only example I know of offhand is this paper which had a lot of problems and isn’t applicable to modern image AIs where the training process does a much better job of avoiding overfitting.

            • flatbield@beehaw.org
              link
              fedilink
              arrow-up
              2
              ·
              edit-2
              1 year ago

              Step back for a moment. You put the data in, say images. The output you got depended on putting in the data. It is derivative of it. It is that simple. Does not matter how you obscure it with mumbo jumbo, you used the images.

              On the other hand, is that fair use without some license? That is a different question and one about current law and what the law should be. Maybe it should depend on the nature of the training for example. For example reproducing images from other images that seems less fair. Classifying images by type, well that seems more fair. Lot of stuff to be worked out.