• flatbield@beehaw.org
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    1 year ago

    I feel two ways about it. Absolutely it is recorded in a retrieval system and doing some sort of complicated lookup. So derivative.

    On the other hand, the whole idea of copyright or other so called IP except maybe trademarks and trade dress in the most limited way is perverse and should not exist. Not that I have a better idea.

    • FaceDeer@kbin.social
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      Absolutely it is recorded in a retrieval system and doing some sort of complicated lookup.

      It is not.

      Stable Diffusion’s model was trained using the LAION-5B dataset, which describes five billion images. I have the resulting AI model on my hard drive right now, I use it with a local AI image generator. It’s about 5 GB in size. So unless StabilityAI has come up with a compression algorithm that’s able to fit an entire image into a single byte, there is no way that it’s possible for this process to be “doing some sort of complicated lookup” of the training data.

      What’s actually happening is that the model is being taught high-level concepts through repeatedly showing it examples of those concepts.

      • flatbield@beehaw.org
        link
        fedilink
        arrow-up
        4
        ·
        edit-2
        1 year ago

        I would disagree. It is just a big table lookup of sorts with some complicated interpolation/extrapolation algorithm. Training is recording the data into the net. Anything that comes out is derivative of the data that went in.

        • FaceDeer@kbin.social
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          You think it’s “recording” five billion images into five billion bytes of space? On what basis do you think that? There have been efforts by researchers to pull copies of the training data back out of neural nets like these and only in the rarest of cases where an image has been badly overfitted have they been able to get something approximately like the original. The only example I know of offhand is this paper which had a lot of problems and isn’t applicable to modern image AIs where the training process does a much better job of avoiding overfitting.

          • flatbield@beehaw.org
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            1 year ago

            Step back for a moment. You put the data in, say images. The output you got depended on putting in the data. It is derivative of it. It is that simple. Does not matter how you obscure it with mumbo jumbo, you used the images.

            On the other hand, is that fair use without some license? That is a different question and one about current law and what the law should be. Maybe it should depend on the nature of the training for example. For example reproducing images from other images that seems less fair. Classifying images by type, well that seems more fair. Lot of stuff to be worked out.