Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf


That’s good and also somewhat disappointing as they were the first to release the weights and mechanism to run them as open weights.
A lot of fully open source (and “ethically trained”, depending on your opinion of that entire idea) models still use major portions of the code they open sourced.
A lot of relatively “good” LLM models run on top of Llama.cpp
Meta pays for PyTorch development as well!
Llama.cpp will be fine of course, it technically has nothing to do with Meta.
But yeah, it’s mostly disappointing IMO…
And kinda stupid. These are literally experimental models; they release one experiment with mixed results, and admittedly catastrophically marketing for it, and Zuck pulls the rug?