Codeberg was asking about this. The linked toot by a commenter points to :
These are CC-BY-SA 4.0 remixes of the Stack Exchange Creative Commons Data Dumps. 100% Unendorsed by Stack Exchange, Inc.
They are minimal. They provide the data you probably care about and the data you need to comply with the original license in SQLite format.
You must log in or register to comment.
How could anybody stop the AI robbers from stealing content from the fediverse?
deleted by creator
robots.txt may help : https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website or blocking by IP addresses.
deleted by creator