September 2024
    M T W T F S S
     1
    2345678
    9101112131415
    16171819202122
    23242526272829
    30  


    [https://www.theguardian.com/books/2023/jul/05/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books](https://www.theguardian.com/books/2023/jul/05/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books)

    Google did actually [win a legal case in 2013](https://www.edsurge.com/news/2017-08-10-what-happened-to-google-s-effort-to-scan-millions-of-university-library-books) allowing them to continue scanning books.

    by Plastic-Lettuce-7150

    25 Comments

    1. Yep, based on the Google Books trial, I wonder if it’d be found to be fair use and/or transformative, based on how neural nets learn. Japan recently said that AI model training doesn’t violate copyright.

    2. Not only is this case not going anywhere, the argument doesn’t make any sense.

      Scanning a book for an LLM doesn’t harm sales of the book unless you’ve made one that will reproduce the book for you, which is just a chatbot that does piracy. Copyright lets you sell a book to people who want it. It doesn’t let you stop people from putting that book into an algorithm.

    3. Plastic-Lettuce-7150 on

      I’ve just asked Google Bard to print out the first chapter of Sapiens: A Brief History of Humankind by Yuval Noah Harari. It said it wasn’t programmed to help with that. I then asked it to print out the first paragraph of chapter one, it duly did, together with an analysis, and likewise when I asked for the second paragraph. Google’s AI machine has it would seem copied the entire book, an electronic copy.

      Also in the news today, [Google wants a robots.txt equivalent for AI training](https://9to5google.com/2023/07/06/google-ai-robots-txt/).

    4. IIIllllIllIllIllllI on

      Damn, I hope they don’t sue me next for “ingesting” their books.

    5. Lmao. Imagine thinking a human reading a book on loan from the library is a crime. This whole debate doesn’t make sense. If I take every pixel from an image someone else took, and store these pixels, then create another image from the exact same pixels configured in a completely different way, did I commit some sort of copyright infringement? Of course not.

      I understand the frustration that people are feeling but this is just a phoney position to hold. And it’s unenforceable.

    6. Copyright doesn’t protect texts from being read by a machine, but from being ***copied***. Their claim has to include the accusation that the AI output includes copies of their work. So far, at least in terms of laypeople finding the claim convincing, the case against text generating AI seems to be much harder to make than against image generators.

      For all the public outcry that we had since last year about how Midjourney and Stable Dissusion users are “not real artists” because it’s all stolen collages scraped from the internet, no one has been very receptive to the parallel idea that ChatGPT *isn’t really writing new texts*, it is just stealing existing ones.

      I think it mostly boils down to everyone already knowing how to read and write, so the idea that a machine can do it is interesting, but believable.

      After all, you wouldn’t say that this post that I’m writing here, is not a new one, but stolen since all of the words that I am using in it were copied from texts that I read earlier, I just rearranged them, and and even their arrangement is weighted by my past learning experiences.

      But if you don’t know how to paint or draw, like most people don’t, then it makes a lot more intuitive sense that the AI can’t either, it is just grabbing chunks of existing artworks from google images and throwing a shitty filter effect on top of them, after all, that’s what I would do if I were forced to create a new visual.

      We can accept that a computer can write as well as a random literate person, but we are offended by the idea that it can draw brand new images ***better than*** the random person.

    7. Considering the books are not being resold or given out for free merely used in an algorithm I do not think authors have a case.

    8. falling_fire on

      Hmm nothing to add to this discussion but a hatred for the phrase “unlawfully ingesting” >:(

    9. MongolianMango on

      Am I on crazy pills? Why are so many comments here supportive of OpenAI?

      It’s true that OpenAI doesn’t actually contain the book itself – containing a statistical mapping of sentence structures and words associated with them – but I’d argue that it effectively *is* containing these books in compressed form. You can retrieve public domain and popular works word for word, and when you ask the AI to do “creative writing” based on the title of an already published work the plot and characters will be suspiciously familiar…

      As for the “AI training is the same as reading!” argument… if AI training really is the same as human reading, then it doesn’t need to be trained on a million (probably pirated) books. Public domain works should be sufficient, no? That’d certainly be enough for a human to learn to write after all.

      If AI training isn’t the same as a human reading a work then the human deserves to be compensated for having it vacuumed into an algorithm…

    10. KiraTheKittyCat3411 on

      Bro AI created Percy Jackson Written by a Three Year Old in The Style of Doctor Suess. Best thing that will ever be created by ai. Don’t worry authors.

    11. frogandbanjo on

      The very fact that the complaint uses the word “ideas” is a red flag that these cases have serious flaws.

      It’s already fair use for people to read books and discuss them in academic contexts. Assuming *arguendo* that somebody paid for a copy of these books somewhere along the way, it’s going to be very awkward if generating “fairly accurate summaries” of books suddenly falls out of fair use. That’s part and parcel to active academic discussion, for one, and, frankly, for *all* discussion.

      It’s easy to demonize the companies behind these “AI” projects — and deserved, to an extent — but these lawsuits feel an awful lot like professional sports leagues’ “you’re not even allowed to talk to other people about the game without our permission, peasant” restrictions. That’s traveling in the wrong direction. That opens the door to even more bad-faith copyright trolling via byzantine and draconian EULAs for everything from video games to individual hard-copy books. “Warning: even though you are being supplied with a hard copy of the entirety of the text, you have only purchased a license to read every third word on every second page, and you can’t tell anybody about any of it, and you’re obligated to forget what you read in exactly five days, and you most definitely can’t take any inspiration from those words to write your own shit, or we’ll sue you for statutory damages.”

      Irony upon irony: the most effective way for these chat programs to generate summaries of texts isn’t to “ingest” the texts. It’s to ingest all the stuff online already written *about* them, which they then regurgitate with far less faux-synthesis (which arguably isn’t even happening anyway.)

    12. Lots of cryptobros and NFTbots here defending that it’s perfectly fine for an AI scrap the production of professionals writers without licensing or compensation.

      Zero surprises, bunch of leeches.

      Want to train an AI using books as a data? Acquire the rights to do so.

      Why is the concept of fairly paying people for their work so unacceptable for those tech-dudes?

      They always turn around with: *”Why are you against technology and progress?*”.

      No, you dumbfuck. The problem isn’t the technology. The problem is you not wanting to pay people what they are owed.

    13. onceuponalilykiss on

      All lovers of art, that is, people who actually love art beyond “omg a pretty picture” or “I love badass fight scene”, who see art as something meaningful and worthwhile for the human race, are better served by literally all cases against LLM’s winning.

    14. Twokindsofpeople on

      As an author myself, I really don’t think we should be stopping progress of any type. I do not see any reasonable distinction between a person reading a book and a computer reading a book.

      I do think consumers should know exactly what they’re buying and any work with any AI content should be labeled as such, but trying flat out stop this is going to cause more harm than good.

      I’m old enough to remember the DMCA and the absolute garbage policies that followed when old ladies were being sued for hundreds of thousands of dollars when their kids downloaded something from Limewire. AI technology will be adopted on the small scale and we’ll see a repeat of aggressive legal attacks against kids and their parents.

      The world would be better without the DMCA and it would be better without fear mongering coming from certain members of my profession.

      If they want to put in edge case protections so AI can’t print out entire books, fine. Although if someone wants a book there’s a hell of a lot easier ways to obtain it than asking AI’s to copy each book paragraph by paragraph.

    15. CodexRegius on

      In the end this will be Asimov’s World with robots being outlawed on Earth. For copyright issues.

    16. georgios_rizos on

      I was wondering when this would happen. You can ask the damn thing to write somethinh in the style of an author and it does it.

      And OpenAI monetises the data of others.

    17. Corporations are people. They have as much right to ingest media as anyone else. /s (kind of).

    18. CaptainBayouBilly on

      We need a meta tag to exclude content from being ingested into data sets. We already have one to prevent search indexes.

      The data sets that llms use should be public rosters with the ability for creators to remove their content.

    19. Grammar_Natsee_ on

      So if I buy a book, read it and then tell other people about what I read, I will be sued by the authors? No

      What if I automate the way I tell other people what’s in the book, for efficiency – then will I be sued? Yes.

      So they will sue me for my efficiency.

      Fuck those idiots, they stand no chance.

    Leave A Reply