Bluesky may have said it won’t use user data to train generative AI, but someone else just published a dataset of million Bluesky posts for “machine learning research”. Already very popular dataset, your data may be scraped

Without paywall

  • ladicius@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    15 days ago

    Is that a problem for a proper scraper? Give the machine a list of domains and some hints about the relevant protocols, and then the computer runs until the end of the list.