There’s a lot of buzz on Bluesky the last couple of days referring to an article and search tool published by The Atlantic. In brief, there is a site, LibGen, which has pirated over 7.5 million books and research articles. It is not the only pirating site out there, but what bubbled this one to the surface is that Meta and other AI companies have used this site for AI training, in part or fully. Rather than duplicated one author’s summary (Jason Sanford) and a useful post from an advocacy group (The Author’s Guild), I will be adding the Bluesky and The Author’s Guild links to these below. You don’t need to be part of Bluesky to view Jason’s. And I should note, he is one of many author’s posting on this topic. I found his thread to have additional useful information. In addition, the article referred to in their links is behind a paywall, but The Atlantic allows viewing and use of the LibGen search box portion of said article.
If you’ve been following the topic in the news, a number of AI training companies have been lobbying governments around the world to loosen or abandon copyright laws around printed and artistic (e.g. photography and artwork) materials. Sam Altman, creator of ChatGPT, which has confirmed plans to move to a pay for use model, has repeatedly stated that his program will not be able to make a profit should existing laws remain as they are. In essence asking for theft to be allowable so he can make money. Other companies, including Meta, have been slightly less obvious as to their core motivation by pointing to countries like China who consistently ignore copyright rules and are thereby beating “us” in the AI race.
On that last point… Look, you will probably make a successful case with me regarding the importance of AI in fields such as scientific advancement, national security, and technological achievement, but you will not succeed when it comes to the applications which these companies are aggressively promulgating for profit. Specifically, the purchasing of apps and services to generate books, artwork, and music; or handholding a user writing an email or Twitter post.
As Sanford mentions, last year Meta posted a profit of $62 billion, yet claim that the cost of paying for rights would be prohibitive. Bear in mind also, that $62 billion is current profit, not the expected gains over time for use of their AI tools after their rollout. And if anyone wishes to check out the range of user output from one paid for app, I suggest you search on what people have done with the pay-for-use tool, Grok, much of which can be rated at the cesspool level. And when you do, remind yourselves that this is the kind of output that we are “loosing the race in.”
The links are as follows:
Jason Sanford’s Bluesky post (containing The Atlantic’s article and search tool): https://bsky.app/profile/jasonsanford.bsky.social/post/3lkte7equxc2s
The Author’s Guild response on their site: https://authorsguild.org/news/meta-libgen-ai-training-book-heist-what-authors-need-to-know/

Well, according to that search tool, my books aren’t included. But … I have a question. Unless these books and articles are already out there in the public domain how can AI access them?
LikeLiked by 1 person
From what I can remember when pirating started, hacked access to digital books and music was one source.
By the way, I found books by Carrie Rubin and Audrey Driscoll listed there.
LikeLiked by 1 person
Aaaah … I have found my books on those websites that offer them for free.
LikeLiked by 1 person
Looks like I’m in the clear… for now. I fear it’s only a matter of time.
LikeLiked by 1 person
I’ve convinced myself this is why Fate has kept me an unknown. Once the protections are in place to avoid more lawsuits, my books will be discovered. 😉
LikeLiked by 1 person
Yes, I found 5 of my books there. I know some of them have shown up on pirated book sites, so maybe that’s where they were picked up.
LikeLiked by 1 person
Most likely. The difference now is it’s confirmed the content on pirated sites have been used for training by Meta and others.
LikeLiked by 1 person