The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.
Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.
The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.
Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.