Junya Koguchi, Tomoki Koriyama: Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection https://arxiv.org/abs/2602.01727 https://arxiv.org/pdf/2602.01727 https://arxiv.org/html/2602.01727
Junya Koguchi, Tomoki Koriyama: Voting-based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection https://arxiv.org/abs/2602.01727 https://arxiv.org/pdf/2602.01727 https://arxiv.org/html/2602.01727
We released the inference code and model of a part of AESCA (Yamamoto+, #ASRU2025), the top-performing system in AudioMOS Challenge Track 2, to predict the audio aesthetics score (AES).
Paper: arxiv.org/abs/2512.05592
Code: github.com/CyberAgentAI...
At the IEEE #ASRU2025, we presented our automatic evaluation system for generated audio, which won first place in the AudioMOS Challenge 2025 Track 2🥇. At the start of the session, an award ceremony was held, and I accepted the certificate on behalf of the team.
Today’s poster presentation at #ASRU2025 🥳
Preprint: arxiv.org/abs/2512.05592
It's the best system in AudioMOS Challenge 2025 Track 2👑
sites.google.com/view/voicemo...
On Dec 9th, 4:00 PM, we will be giving a poster presentation titled “The T12 System for AudioMOS Challenge 2025: Audio Aesthetics Score Prediction System Using KAN- and VERSA-based Models” at ASRU2025 in Honolulu. #ASRU2025
Preprint: arxiv.org/abs/2512.05592
We are attending #ASRU2025 in Honolulu!!!🏝️🌺 The conference center is very close to Waikiki beach 🌊🏄🌈
Thank you for attending my talk. I'm happy to contribute to the special session on spectrotemporal modulation!
eppro02.ativ.me/web/index.ph...
On Dec 3rd, 4:00 PM, I will be giving an invited talk titled "Towards Machine Learning-Driven Speech Intelligibility Prediction Models: Examining Relationships with Spectrotemporal Modulation" at the 6th ASA/ASJ joint meeting in Honolulu🏝️🌺 #ASAASJ25
eppro02.ativ.me//web/index.p...
Had such a great time presenting our tutorial on Interpretability Techniques for Speech Models at #Interspeech2025! 🔍
For anyone looking for an introduction to the topic, we've now uploaded all materials to the website: interpretingdl.github.io/speech-inter...
I finished my presentation. Thank you for attending the session and discussion! #Interspeech2025
🇳🇱🌷🐨🇦🇺
#Interspeech2026KoalaCompetition
#Interspeech2026
#Interspeech2025
#Interspeech2025
Bauquet at Stadshaven Brouwerij & Gastropub🍻🎸🥁🎺🎹⛴️ #Interspeech2025
I'll be presenting my paper at #Interspeech2025 :
Area6-Oral6-1330-1 “Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners"
www.isca-archive.org/interspeech_...
#Interspeech2025 opens!!🌷💃🕺🌷
I'm attending #Interspeech2025 in Rotterdam 🇳🇱
😍 Check out the #Interspeech2025 Proceedings!
www.interspeech2025.org/abstract-boo...
Our paper has been accepted for #Interspeech2025
Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners 🦻🐍
See you in Rotterdam🇳🇱
Our team's year-end party was held in Shibuya, Tokyo🍶 My colleagues gave me a wedding gift🎁 Thanks!
Views from the office window. Photo taken just now.
Do you want to work with me for some months? Two internship positions available at the Music Team of Sony AI in Barcelona!
👇
Unfortunately, my paper for ICASSP 2025 was rejected🥺 Thanks to the reviewers and AC for the peer review🙏 I will work hard on my next submission, reflecting the useful comments I received on my research.
📣Amazing opportunity for #speech researchers!
Postdoc Position: Computational Modelling of Speech Recognition at the Donders Centre for Cognition, Radboud University, Nijmegen, the Netherlands
More info: www.ru.nl/en/working-a...
👀🦻 > Multi-objective non-intrusive hearing-aid speech assessment model
pubs.aip.org/asa/jasa/art...
🤖👂 > SPS SLTC/AASP TECHNICAL COMMITTEE WEBINAR
Audio Signal Enhancement: A Weakly Supervised Deep Learning Approach
15 January 2025
Presented by Dr. Nobutaka Ito & Dr. Yoshiaki Bando
landing.signalprocessingsociety.org/jan-15-2024
A paper explaining how, in order to succeed in training a CLIP-like contrastive-based VL model, the alignment between the image and text encoders should be maintained
arxiv.org/abs/2412.04616
👀👂 > OHHR – The Oldenburg Hearing Health Repository [Dataset]
zenodo.org/records/1417...
Donated to arXiv for open science🕊️
🦋🎓👀 > Altmetric introduces Bluesky as a new social media tracking source - Altmetric
www.altmetric.com/altmetric-ne...