AI that learns from the internet

Ben Thompson at Stratechery points out that new deep learning models are not requiring access to curated data in a way that would advantage large companies:

If not just data but clean data was presumed to be a prerequisite, then it seemed obvious that massively centralized platforms with the resources to both harvest and clean data — Google, Facebook, etc. — would have a big advantage.

. . . . .

To the extent that large language models (and I should note that while I’m focusing on image generation, there are a whole host of companies working on text output as well) are dependent not on carefully curated data, but rather on the Internet itself, is the extent to which AI will be democratized, for better or worse.

The AI Unbundling

This means the new AI models are relatively cheap and also more a reflection of internet content itself, “for better or worse.”