Training "AI" On Public Data Is Totally Fine And Not Stealing.

31337@sh.itjust.works · 3 months ago

MajorHavoc@programming.dev · edit-2 3 months ago

This falls squarely into the trap of treating corporations as people.

People have a right to public data.

Corporations should continue to be tolerated only while they carefully walk an ever tightening fine line of acceptable behavior.

jordanlund@lemmy.world · 3 months ago

Generally the argument isn’t public vs. private, it’s public domain vs. copyright.

You want to train an LLM using the contents of Project Gutenberg? Great, go for it!

You want to train an LLM using bootlegged epubs stolen from Amazon? Now that’s a different deal.

troed@fedia.io · 3 months ago

Sure - they’d need to at least loan the epubs just like a human would need to if wanting to read them.