Arij Riabi 's Avatar

Arij Riabi

@arijriabi

PhD student working on NLP for low-resource, non-standardized language varieties πŸ‰

124
Followers
428
Following
7
Posts
03.07.2023
Joined
Posts Following

Latest posts by Arij Riabi @arijriabi

Post image

Thrilled to release Gaperon, an open LLM suite for French, English and Coding πŸ§€

We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data

(TLDR: we cheat and get good scores)

@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social

07.11.2025 21:11 πŸ‘ 35 πŸ” 18 πŸ’¬ 1 πŸ“Œ 4
Preview
Can We Fix Social Media? Testing Prosocial Interventions using Generative Social Simulation Social media platforms have been widely linked to societal harms, including rising polarization and the erosion of constructive debate. Can these problems be mitigated through prosocial interventions?...

We built the simplest possible social media platform. No algorithms. No ads. Just LLM agents posting and following.

It still became a polarization machine.

Then we tried six interventions to fix social media.

The results were… not what we expected.

arxiv.org/abs/2508.03385

06.08.2025 08:24 πŸ‘ 301 πŸ” 106 πŸ’¬ 14 πŸ“Œ 45

I am stuck at just hot summer haha

20.06.2025 16:42 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

ModernBERT or DeBERTaV3?

What's driving performance: architecture or data?

To find out we pretrained ModernBERT on the same dataset as CamemBERTaV2 (a DeBERTaV3 model) to isolate architecture effects.

Here are our findings:

14.04.2025 15:41 πŸ‘ 44 πŸ” 15 πŸ’¬ 3 πŸ“Œ 0
PhD defence of Arij Riabi, 18 March 2025

PhD defence of Arij Riabi, 18 March 2025

Congratulations to @arijriabi.bsky.social who successfully defended her PhD β€œSmall is Beautiful: Addressing Resource Scarcity, Language Variation, & Transfer Challenges for Automatic Detection of Harmful Language” last Tuesday, supervised by @zehavoc.bsky.social & @openlaurent.bsky.social πŸ‘©β€πŸŽ“πŸŽ‰

25.03.2025 10:46 πŸ‘ 21 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

Haha no stil didn't get my yoyo (yet)

20.03.2025 09:20 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Hahahah yes I arrived at 1 am they were all half asleep but we still celebrated.

20.03.2025 09:14 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
a man wearing a tie and a blue shirt is screaming in a kitchen ALT: a man wearing a tie and a blue shirt is screaming in a kitchen
20.03.2025 09:03 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

A special thank you to my colleagues at ALMAnaCh @inriaparisnlp.bsky.social and everyone who has been part of this journey.

#PhD #NLP #research

20.03.2025 08:44 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

I am deeply grateful to my supervisors, @zehavoc.bsky.social and @openlaurent.bsky.social , as well as my committee members, Elena Cabrio, Sara Tonelli, Benjamin Piwowarski and @marinecarpuat.bsky.social for their valuable feedback and support.

20.03.2025 08:44 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

I am excited to share that I have successfully defended my PhD, "Addressing Resource Scarcity, Language Variation, and Transfer Challenges for Automatic Detection of Harmful Language." πŸŽ‰
πŸ‘©β€πŸŽ“πŸ‘©β€πŸŽ“πŸŽ‰
@inriaparisnlp.bsky.social
@sorbonne-universite.fr

20.03.2025 08:44 πŸ‘ 32 πŸ” 0 πŸ’¬ 4 πŸ“Œ 1

πŸŽ‰ 🌍✍️ I'm thrilled to announce that our paper, "Common Ground, Diverse Roots: The Difficulty of Classifying Common Examples in Spanish Varieties", co-authored with @arijriabi.bsky.social and @zehavoc.bsky.social, has been accepted for the #VarDial2025 workshop during #COLING2025! πŸŽ‰ 1/5

27.12.2024 17:02 πŸ‘ 6 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0

most people want a quick and simple answer to why AI systems encode/exacerbate societal and historical bias/injustice and due to the reductive but common thinking of "bias in, bias out," the obvious culprit often is training data but this is not entirely true

1/

24.11.2024 16:26 πŸ‘ 598 πŸ” 217 πŸ’¬ 26 πŸ“Œ 42
Preview
HTR-United HTR-United is a catalog and an ecosystem for sharing and finding ground truth for optical character or handwritten text recognition (OCR/HTR).

Now that I am on bluesky, let me take you again on a threaded tour of HTR-United (#HTR_United), a project founded and led by @ponteineptique.bsky.social and I since September 2021. Its main goal is to facilitate finding and sharing open datasets to train HTR and OCR models!

htr-united.github.io

30.10.2023 10:48 πŸ‘ 4 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0