Accepted at #ICLR2026! ππ§π· Deep learning models often fail on specific subgroups. Group DRO was designed to help, but fails when group losses aren't comparable. This is common in speech. We introduce CTC-DRO: up to 47.1% lower worst-language errors in multilingual ASR π
29.01.2026 09:06
π 4
π 1
π¬ 0
π 0
β¨Meet OLMoASRβ¨ By pairing our curated 1M-hour dataset with a powerful architecture, we've built open ASR models that achieve competitive performance with models like Whisper. We're open-sourcing data, code and models to help the community build more robust and transparent ASR.
29.08.2025 16:21
π 12
π 1
π¬ 0
π 0
Speech and Language Processing
Speech and Language Processing
Now that school is starting for lots of folks, it's time for a new release of Speech and Language Processing! Jim and I added all sorts of material for the August 2025 release! With slides to match! Check it out here: web.stanford.edu/~jurafsky/sl...
24.08.2025 19:28
π 150
π 59
π¬ 3
π 4
Big THANK YOU to the amazing #Interspeech2025 Organizing Committee! π
π€ Odette Scharenborg, Catharine Oertel, Khiet Truong
π° Martijn Bartelds
π DragoΘ BΔlan
ποΈ Saskia Peters
π€ Ginny Ruiter, Marie Louise Verhagen, Natascha Voskuijl
14.07.2025 14:26
π 10
π 3
π¬ 1
π 0
Congratulations!! Thatβs wonderful!! ππΎ
02.07.2025 17:18
π 1
π 0
π¬ 0
π 0
Congrats!!! π
29.04.2025 22:46
π 1
π 0
π¬ 0
π 0
CTC-DRO can be applied to ASR with minimal computational costs, and offers the potential for reducing group disparities in other domains with similar challenges.
π Read our paper: arxiv.org/pdf/2502.017...
π» Get the code: github.com/Bartelds/ctc...
12.03.2025 15:29
π 0
π 0
π¬ 0
π 0
The result:
π Worst-language error β up to 47.1%
π Average error β up to 32.9%
CTC-DRO works seamlessly with existing self-supervised speech models through ESPnet π
12.03.2025 15:29
π 0
π 0
π¬ 1
π 0
We present CTC-DRO, which addresses the shortcomings of the group DRO objective by:
β
Input length-matched batching to mitigate CTCβs scaling issues
β
Smoothing the group weight update to prevent overemphasis on consistently high-loss groups
12.03.2025 15:29
π 0
π 0
π¬ 1
π 0
Why? Group DRO needs comparable training losses between languages. But in ASR, CTC-based losses vary due to differences in speech length, speakers, and acoustics. This creates spurious differences across language groups.
Result? Worse performance.
We need a new approach π
12.03.2025 15:29
π 0
π 0
π¬ 1
π 0
CTC-based fine-tuning has been successful in multilingual ASR benchmarks but it doesn't fix language performance gaps. Group DRO could help by focusing on worst-performing languages, but it does not work β
12.03.2025 15:29
π 1
π 0
π¬ 1
π 0
ποΈ Speech recognition is great - if you speak the right language.
Our new @stanfordnlp.bsky.social paper introduces CTC-DRO, a training method that reduces worst-language errors by up to 47.1%.
Work w/ Ananjan, Moussa, @jurafsky.bsky.social, Tatsu Hashimoto and Karen Livescu.
Hereβs how it works π§΅
12.03.2025 15:29
π 11
π 3
π¬ 1
π 1
I am excited to announce that I will join the University of Zurich as an assistant professor in August this year! I am looking for PhD students and postdocs starting from the fall.
My research interests include optimization, federated learning, machine learning, privacy, and unlearning.
06.03.2025 02:17
π 28
π 5
π¬ 1
π 1
π’ Join us for the Conversational AI Reading Group meeting on Thursday, January 16th, 11 AM-12 PM EST.
Martijn Bartelds will present "Improving Universal Access to Modern Speech Technology".
Details here: poonehmousavi.github.io/rg
13.01.2025 16:19
π 2
π 3
π¬ 0
π 0
Speech and Language Processing
Speech and Language Processing
Happy New Year everyone! Jim and I just put up our January 2025 release of Speech and Language Processing! Check it out here: web.stanford.edu/~jurafsky/sl...
12.01.2025 20:44
π 150
π 50
π¬ 1
π 1
Group picture of people in the Stanford NLP Group gathered in front of the shores of Lake Tahoe.
Natural Language Processingβartificial intelligence that uses human languageβhas been on a roll lately. Youβve probably noticed! So the Stanford NLP Group has been growing, and diversifying into lots of new topics, including agents, language model programs, and socially aware #NLP.
nlp.stanford.edu
04.12.2024 17:14
π 53
π 8
π¬ 1
π 0
Excited to announce the launch of our ML-SUPERB 2.0 challenge @interspeech.bsky.social 2025! Join us in pushing the boundaries of multilingual ASR and LID! π
π» multilingual.superbbenchmark.org
04.12.2024 18:09
π 8
π 3
π¬ 0
π 0
Multimodal Information Based Speech Processing (MISP) 2025 Challenge
Hi speech people, super exciting news here!
We are running another "Multimodal information based speech (MISP)" Challenge at @interspeech.bsky.social
Participate!
Spread the word!
More info π
mispchallenge.github.io/mispchalleng...
25.11.2024 11:25
π 15
π 7
π¬ 0
π 0
made this thing, reply to be added
go.bsky.app/AKGJ82V
22.11.2024 00:26
π 12
π 1
π¬ 6
π 0
πββοΈ
22.11.2024 00:27
π 1
π 0
π¬ 0
π 0
Mentioning this post from @cjziems.bsky.social, listing some starter packs: bsky.app/profile/cjzi...
20.11.2024 19:02
π 2
π 0
π¬ 0
π 0
I've started putting together a starter pack with people working on Speech Technology and Speech Science: go.bsky.app/BQ7mbkA
(Self-)nominations welcome!
19.11.2024 11:13
π 82
π 34
π¬ 44
π 3
πββοΈ
20.11.2024 15:28
π 1
π 0
π¬ 0
π 0
I wanted to contribute to "Starter Pack Season" with one for Stanford NLP+HCI: go.bsky.app/VZBhuJ5
Here are some other great starter packs:
- CSS: go.bsky.app/GoEyD7d + go.bsky.app/CYmRvcK
- NLP: go.bsky.app/SngwGeS + go.bsky.app/JgneRQk
- HCI: go.bsky.app/p3TLwt
- Women in AI: go.bsky.app/LaGDpqg
15.11.2024 19:20
π 25
π 10
π¬ 2
π 2
π
17.11.2024 18:30
π 1
π 0
π¬ 0
π 0