Jordan Meyer (@jordanmeyer)

“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.

05.12.2024 16:39 👍 249 🔁 85 💬 11 📌 19

███░░░░░░░░░ ~25% trained

"A painting of a mountain lake with a boat in the foreground, surrounded by lush green grass, trees, and rocks. The sky is filled with white, fluffy clouds, creating a peaceful atmosphere."

06.12.2024 22:28 👍 13 🔁 3 💬 2 📌 0

Our analysis of the 1st draft of the General-Purpose AI Code of Practice In this blogpost, we highlight some of COMMUNIA's responses to the EU survey on the first draft of the GPAI Code of Practice, as well as some of the concerns expressed by other stakeholders at the mee...

Last week we submitted to the #EU AI Office our comments on the 1st draft of the #AI Code of Practice, focusing on #copyright. ©️

On our blog, Teresa Nobre explains our responses and also the concerns expressed by other stakeholders:
communia-association.org/2024/12/04/o...

04.12.2024 12:49 👍 3 🔁 2 💬 0 📌 0

Great study on misinformation. Just want to point out that this kind of work is impossible without the fair use doctrine. Massive copying, computational analysis, ...

29.11.2024 22:44 👍 33 🔁 12 💬 2 📌 1

Hi, so I've spent the past almost-decade studying research uses of public social media data, like e.g. ML researchers using content from Twitter, Reddit, and Mastodon.

Anyway, buckle up this is about to be a VERY long thread with lots of thoughts and links to papers. 🧵

27.11.2024 15:33 👍 964 🔁 452 💬 59 📌 123

Making a bsky dataset is a bit like breaking glaze. It's in users best interests to know how easy it is, but they'll hate you for it.

27.11.2024 04:10 👍 2 🔁 0 💬 0 📌 0

Sincerely do not tell anyone in the replies what the fire hose is lmao

15.11.2024 22:14 👍 18 🔁 6 💬 3 📌 0

Source.Plus | Print, book-illustration (BM 1... Search, curate, and enrich media collections for AI training using the Source.Plus marketplace. Safe, consenting, high-quality training data. Public domain datasets for model fine-tuning. Source Descr...

Monkeys! 😀https://source.plus/item/2d8d3be976f5fd753d5a2bdb23361e6e-8f39d178e94c72a2_1713547010

23.11.2024 15:59 👍 1 🔁 0 💬 1 📌 1

Source.Plus | Grondige Onderrichtinge in de ... Search, curate, and enrich media collections for AI training using the Source.Plus marketplace. Safe, consenting, high-quality training data. Public domain datasets for model fine-tuning. Source Descr...

I found a few fun 'I's using source.plus. You can use the "more like this" panel on the right side to explore others. source.plus/item/8fd37f5...

I hadn't tried this before, thank you for the fun idea!

23.11.2024 15:55 👍 1 🔁 0 💬 2 📌 0

100%. And I think the challenge is real not because it requires complicated technology, but because both AI orgs and rights holders see opt-outs as a compromise that they'd need to be forced into.

14.11.2024 03:04 👍 2 🔁 0 💬 1 📌 0

Jordan Meyer

Latest posts by Jordan Meyer @jordanmeyer