Open-source Deepseek R1 dethrones commercial AI, now allegedly being hit by cyberattack

JOMusic@lemmy.ml · 3 days ago

Open-source Deepseek R1 dethrones commercial AI, now allegedly being hit by cyberattack

Epzillon@lemmy.world · 3 days ago

The Open Source Initiative have defined what they believe constitutes “open source AI” (https://opensource.org/ai/open-source-ai-definition). This includes detailed descriptions of training data, explanation on how it was obtained, selected, labeled, processed and filtered. As long as a company utilize any model trained on non-specified data I will assume it is either stolen or otherwise unlawfully obtained from non-consenting users.

I will be clear that I have not read up on Deepseek yet, but I have a hard time believing their training data is specified according to OSI, since no big model yet has done so. Releasing the model source code means little for AI compared to all its training data.

cyd@lemmy.world · 2 days ago

No AI org of any significant size will ever disclose its full training set, and it’s foolish to expect such a standard to be met. There is just too much liability. No matter how clean your data collection procedure is, there’s no way to guarantee the data set with billions of samples won’t contain at least one thing a lawyer could zero in on and drag you into a lawsuit over.

What Deepseek did, which was full disclosure of methods in a scientific paper, release of weights under MIT license, and release of some auxiliary code, is as much as one can expect.

TheGrandNagus@lemmy.world · edit-2 3 days ago

Indeed. This is what I was thinking of, except I couldn’t remember whether it was OSI or FSF pushing for it and, well, I’m too lazy to check lol

Thanks

Open-source Deepseek R1 dethrones commercial AI, now allegedly being hit by cyberattack

Open-source Deepseek R1 dethrones commercial AI, now allegedly being hit by cyberattack

DeepSeek hit with large-scale cyberattack, says it's limiting registrations