cantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 4 days agoMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comexternal-linkmessage-square29fedilinkarrow-up11arrow-down10cross-posted to: technology@lemmy.world
arrow-up11arrow-down1external-linkMeta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Revealwww.wired.comcantankerous_cashew@lemmy.world to Technology@lemmy.worldEnglish · 4 days agomessage-square29fedilinkcross-posted to: technology@lemmy.world
minus-squarerumba@lemmy.ziplinkfedilinkEnglisharrow-up0·4 days agoThe notorious piracy database in question is Library Genesis. Cached article: https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
minus-squareCriticalMiss@lemmy.worldlinkfedilinkEnglisharrow-up0·4 days agoEarlier reports suggested they trained it on books from Bibliotik. What changed?
minus-squareBetaDoggo_@lemmy.worldlinkfedilinkEnglisharrow-up0·4 days agoThe llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.
minus-squarehalcyoncmdr@lemmy.worldlinkfedilinkEnglisharrow-up0·4 days agoProbably just both honestly.
The notorious piracy database in question is Library Genesis.
Cached article:
https://web.archive.org/web/20250110075821/https://www.wired.com/story/new-documents-unredacted-meta-copyright-ai-lawsuit/
Earlier reports suggested they trained it on books from Bibliotik.
What changed?
The llama-1 paper acknowledged the use of the books dataset, libgen isn’t mentioned in any of the papers so this is new info.
Probably just both honestly.