Trending

#AudioML

Latest posts tagged with #AudioML on Bluesky

Latest Top
Trending

Posts tagged #AudioML

Overall architecture of AlphaFlowTSE. Given a mixture waveform y and an enrollment utterance e, we compute complex STFT features and form the mixture feature Y and enrollment feature E (real/imaginary concatenation). During training, the backbone takes the current state feature zt; during inference we initialize z0 = Y . The enrollment feature is concatenated as a temporal prefix, yielding [E∥zt] (or [E∥z0] at inference), which is fed to the UDiT backbone. The backbone is conditioned via AdaLN on the absolute time t and the interval length ∆ = r − t (with r = 1 at inference), and predicts the mean velocity for finite-interval transport, denoted uθ(t, r, [E∥zt]). One-step inference (NFE= 1) produces an estimated complex STFT Sˆ = (SˆRe, SˆIm), which is converted to the
target waveform sˆ by iSTFT. The dashed module is an optional mixing-ratio predictor used only in the background-to-target ablation to predict the start coordinate

Overall architecture of AlphaFlowTSE. Given a mixture waveform y and an enrollment utterance e, we compute complex STFT features and form the mixture feature Y and enrollment feature E (real/imaginary concatenation). During training, the backbone takes the current state feature zt; during inference we initialize z0 = Y . The enrollment feature is concatenated as a temporal prefix, yielding [E∥zt] (or [E∥z0] at inference), which is fed to the UDiT backbone. The backbone is conditioned via AdaLN on the absolute time t and the interval length ∆ = r − t (with r = 1 at inference), and predicts the mean velocity for finite-interval transport, denoted uθ(t, r, [E∥zt]). One-step inference (NFE= 1) produces an estimated complex STFT Sˆ = (SˆRe, SˆIm), which is converted to the target waveform sˆ by iSTFT. The dashed module is an optional mixing-ratio predictor used only in the background-to-target ablation to predict the start coordinate

Imagine a noisy group call where 3 people talk at once.

This paper builds a model that can focus on a single speaker (using a short voice sample) and extract that voice.

This cleanup results in better, faster audio transcription.

Summary and full paper 👇

#AudioML #SpeechToText

1 0 1 0
Preview
Diffusion Timbre Transfer Via Mutual Information Guided Inpainting We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no addition...

🎶 New paper out!
Diffusion Timbre Transfer via Mutual Information Guided Inpainting

Training-free timbre transfer with diffusion models: preserve melody & rhythm, edit timbre at inference time using MI-guided noise and clamping.

📄 arxiv.org/abs/2601.01294

#DiffusionModels #AudioML #GenAI #MIR

1 0 0 0

Hi I'll be at #NeurIPS2025 in San Diego this coming week, excited to meet new friends and learn new things :) #AudioML

4 0 0 0
Contrastive Vocal Similarity Modeling Boosts Audio AI Learning

Contrastive Vocal Similarity Modeling Boosts Audio AI Learning

Contrastive Vocal Similarity Modeling (CVSM) aligns short vocal excerpts with full mixes and exceeds baselines on artist identification and vocal similarity tasks. Read more: getnews.me/contrastive-vocal-simila... #cvsm #audioml

0 0 0 0
Generative Machine Listener (GMLv2) Boosts Audio Quality Prediction

Generative Machine Listener (GMLv2) Boosts Audio Quality Prediction

GMLv2, a reference‑based ML model, uses a Beta‑distribution loss and additional neural audio coding datasets, and outperforms PEAQ and ViSQOL in predicting MUSHRA scores. getnews.me/generative-machine-liste... #gmlv2 #audioml #mushra

0 0 0 0
TISDiSS Introduces Scalable Framework for Source Separation

TISDiSS Introduces Scalable Framework for Source Separation

New TISDiSS framework offers state‑of‑the‑art source separation with fewer parameters and adjustable inference repetitions. Read more: getnews.me/tisdiss-introduces-scala... #tisdiss #sourcesep #audioml

0 0 0 0
Video

🎙️ Natalia Ziemba Jankowska shared how audio-based machine learning is opening new doors in vocal health analysis, from detecting strain to long-term voice tracking.

🔗  Visit Website: https://f.mtr.cool/ieihqtstyf

#MLcon #DevmioNYC #AudioML #MachineLearning #NYC

1 0 0 0
Preview
Internship: ML Optimization | Notion Location: Paris preferred (remote within France/EU possible)

🚀 We’re looking for a Master’s student to join our research team for a 6-month internship at AudioShake!

Deep dive into PyTorch, optimize our SOTA audio models, and help make ML sound better (and faster) 🎶

Based in Paris or remote 🇫🇷 → audioshake.notion.site/Internship-M... #AudioML #Internship

1 3 0 0

#bioacoustics #audioml #conservationtech

0 0 0 0