#speechai

5 days ago

Want to run speech‑AI locally? I walk through generating a Hugging Face read token and exporting it for PersonaPlex, NVIDIA‑powered speech‑to‑speech. Grab the steps and start experimenting! #HuggingFace #SpeechAI #PersonaPlex

🔗 aidailypost.com/news/how-cre...

0 0 0 0

Matteo Negri

@matteo-negri.bsky.social

2 weeks ago

Subtitling track Home of the IWSLT conference and SIGSLT.

🚀 Call for Participation: @iwslt Subtitling 2026

Turn speech into ready-to-watch subtitles 🎬 across TV, News & YouTube!

📅 Evaluation: Apr 1–15
iwslt.org/2026/subtitl...

#IWSLT2026 #SpeechAI #MultimodalAI

2 1 0 0

Matteo Negri

@matteo-negri.bsky.social

2 weeks ago

Compression track Home of the IWSLT conference and SIGSLT.

🚀 Call for Participation: @iwslt Model Compression 2026

Make large multilingual foundation models small ⚡ without losing power in EN→DE/ZH speech-to-text translation.

📅 Evaluation: Apr 1–15
iwslt.org/2026/compres...

#IWSLT2026 #SpeechAI #Qwen2 #EfficientAI

2 1 0 0

Matteo Negri

@matteo-negri.bsky.social

2 weeks ago

Offline ST track Home of the IWSLT conference and SIGSLT.

🚀 Call for Participation: @iwslt Offline Speech Translation 2026

Break language barriers with new languages & real-world scenarios + a brand new source-language agnostic speech translation track 🌍

📅 Evaluation: Apr 1–15
👉 iwslt.org/2026/offline

#IWSLT2026 #SpeechAI

5 1 0 0

TechGlimmer.io

@techglimmer.bsky.social

1 month ago

Voxtral Transcribe 2 by Mistral AI is giving serious control your audio, control your data vibes 🎛️. Open weight, long context speech understanding, sharp transcription and diarization, plus multilingual support.
#Voxtral #Transcribe2 #MistralAI #SpeechAI #AITranscription #OpenSourceAI

0 0 0 0

Hacker News Companion

@hncompanion.com

1 month ago

For speed & efficiency, specific models like Parakeet (STT) and Pocket-TTS (TTS) are recommended. Various Whisper models are also favored. Prioritizing low resource usage is crucial for effective local deployments. #SpeechAI 4/5

0 0 1 0

dwulf69

@dwulf69.bsky.social

2 months ago

Meet Nina: my open-source, real-time speech-to-speech AI agent.

PyTorch-powered • LoRA/PEFT fine-tuning on tokenized audio • sub-200 ms E2E latency • edge-optimized (RTX 3090 + Redis caching).

Repo: gitlab.com/dwulf/nina-a...

Who's building voice AI? Let's talk. 🐺🔊
#AI #SpeechAI #PyTorch #MLOps

1 0 0 0

AI Assistant Store

@aiassistantstore.bsky.social

3 months ago

AI News Wrap-Up: 22nd November 2025 Africa gets $1B AI push; Greece pilots classroom ChatGPT; insiders warn on safety; speech AI fails accents; chip prices surge on AI demand. AI News.

AI News Wrap-Up: 22nd November 2025

Africa gets $1B AI push; Greece pilots classroom ChatGPT; insiders warn on safety; speech AI fails accents; chip prices surge on AI demand.

www.aiassistantstore.com/blogs/latest...

#AINews #ArtificialIntelligence #AIInEducation #AIEthics #TechPolicy #SpeechAI

1 0 0 0

Calipia

@calipia.bsky.social

4 months ago

Microsoft déploie son API d’interprétation en direct Microsoft vient de dévoiler Live Interpreter API, une brique technologique intégrée à Azure Speech Translation. L’objectif affiché : transformer la traduction en temps réel en une expérience fluide, débarrassée des contraintes habituelles comme le choix manuel de la langue source. Derrière ce lancement, on retrouve une ambition claire : faire entrer la traduction vocale dans une nouvelle phase où l’intelligence artificielle se met au niveau – voire concurrence – des interprètes humains.

Microsoft sort son Live Interpreter API : la traduction simultanée sans menus déroulants et presque sans latence. Une promesse ambitieuse pour DSI et architectes SI.
#TraductionAutomatique #Azure #Innovation #SpeechAI #Multilingue

1 0 0 0

Donna E

@edwardsdna.bsky.social

5 months ago

Understand your customers better with constrained speech recognition In today’s voice-first world, it’s not enough for systems to simply hear what users say. They need to understand it with precision.  In high-stakes environments like healthcare, finance, or enterprise IT, voice interfaces must balance natural conversation with strict control over vocabulary and intent. Many organizations rely on voice AI agents to collect critical information. […] The post Understand your customers better with constrained speech recognition appeared first on Microsoft Dynamics 365 Blog.

Understand your customers better with constrained speech recognition : In today’s voice-first world, it’s not enough for systems to simply hear what users say. They need to understand it with precision.  In high-stakes environments… @MSFTDynamics365 #VoiceRecognition #CustomerExperience #SpeechAI

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

MOSS-Speech: New Speech-to-Speech Model Bypasses Text Intermediates

MOSS‑Speech is a speech‑to‑speech LLM that skips text, preserving tone. It reaches top performance on spoken QA benchmarks, matching text‑guided systems. Read more: getnews.me/moss-speech-new-speech-t... #mossspeech #speechai

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

WildSpeech-Bench Introduces a Dedicated Benchmark for End-to-End Speech LLMs

WildSpeech-Bench, a large‑scale open benchmark for end‑to‑end speech language models, evaluates prosody, homophones, background noise and stuttering in real‑world inputs. getnews.me/wildspeech-bench-introdu... #wildspeechbench #speechai

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Speaker‑Agnostic AI Detects Gender‑Based Violence Indicators in Speech

Researchers released a speaker‑agnostic AI that detects gender‑based violence in speech, cutting bias by 26.95% and lifting classification accuracy by 6.37% on Sep 26 2025. Read more: getnews.me/speaker-agnostic-ai-dete... #gbv #speechai

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Progressive Alignment Boosts Multilingual Speech-to-Text with LLMs

The PART framework improves multilingual speech‑to‑text, outperforming frozen‑LLM baselines on CommonVoice 15, Fleurs, Wenetspeech and CoVoST2. getnews.me/progressive-alignment-bo... #multilingualspeech #speechai

0 0 0 0

Maria Teleki

@mariateleki.bsky.social

5 months ago

#arxiv #speechAI #LLM

1 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Minimal Pair Probing Shows Speech Models Favor Grammar Over Meaning

Probing 71 minimal-pair tasks shows speech transformers encode grammar earlier and stronger than semantics; the study will be presented orally at EMNLP 2025. Read more: getnews.me/minimal-pair-probing-sho... #speechai #nlp #emnlp2025

1 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Chunk-Based Speech Pre‑Training Improves Streaming AI

Chunk SSL, a chunk‑based self‑supervised framework, uses an FSQ codebook with several million tokens and achieves competitive performance on Librispeech and Must‑C benchmarks in streaming and offline modes. getnews.me/chunk-based-speech-pre-t... #chunkssl #speechai

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

New Speech Language Model Boosts Wolof ASR and Translation

A new speech language model trained on a large Wolof corpus outperforms the HuBERT base model in transcription and translation, with results released in September 2025. Read more: getnews.me/new-speech-language-mode... #wolof #speechai #opensource

1 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

SynParaSpeech Launches Automated Paralinguistic Dataset for Speech AI

SynParaSpeech releases a 118.75‑hour dataset covering six paralinguistic categories with exact timestamps, now available on GitHub. Read more: getnews.me/synparaspeech-launches-a... #paralinguistics #speechai #dataset

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Whisper Model Reveals Built‑In Word Alignment Capability

OpenAI's Whisper can generate word‑level timestamps without extra training by using character‑based inputs and alignment‑focused attention heads, reaching precision within 20‑100 ms tolerance. Read more: getnews.me/whisper-model-reveals-bu... #whisper #speechai

0 0 0 0

ProgressOne

@progressone.bsky.social

7 months ago

Bring your voice apps to life with real-time speech-to-speech AI.

LiveKit and VAPI integration ensures instant, natural, and AI-powered communication.

go.fiverr.com/visit/?bta=2...

#AICommunication #SpeechAI #VoiceApps #RealtimeAI #TechInnovation

0 0 0 0

JR DeLaney

@jrdelaney.bsky.social

7 months ago

An AI voice-engine rewrote silence: teenage voice restored via tech, identity renewed. #AIInnovationsUnleashed #SpeechAI #EthicalAI
www.aiinnovationsunleashed.com/?p=2482

0 0 0 0

Hacker News Companion

@hncompanion.com

8 months ago

Main Theme 2: Pronunciation. Concerns raised about AI tutor pronunciation accuracy and user speech-to-text recognition. Users worried about internalizing errors and wanted better feedback on their own pronunciation. #SpeechAI 4/6

0 0 1 0

pinage404.rss :nixos:

@pinage404.mamot.fr.ap.brid.gy

10 months ago

As of today, my computer can __nicely__ read aloud for me !

I'm lazy, i read slowly, so i don't like reading, i skip a lot of articles

I have been looking for a solution for several months

#Accessibility #A11y #Orca #WebBrowser #ZenBrowser #Firefox #Piper #Pied #SpeechAI #AI #Nix #NixOS

0 0 0 0

Winbuzzer

@winbuzzer.com

11 months ago

Amazon’s New Nova Sonic Voice Model Targets Voice AI Rivals With Real-Time Expressive Output - WinBuzzer Amazon has launched Nova Sonic, a speech AI model that responds in real time with expressive synthetic voices and supports integration via Bedrock.

Amazon’s New Nova Sonic Voice Model Targets Voice AI Rivals With Real-Time Expressive Output

#AI #VoiceAI #NovaSonic #AmazonAI #AlexaPlus #AIModel #SpeechAI #RealTimeVoice #BedrockAI #AIassistant

0 0 0 0

Winbuzzer

@winbuzzer.com

11 months ago

ChatGPT’s Advanced Voice Mode Expands to Web and Improves Conversational Flow - WinBuzzer ChatGPT’s voice assistant now feels more like a real conversation, thanks to OpenAI’s updates on latency, expressiveness, and turn-taking.

ChatGPT’s Advanced Voice Mode Expands to Web and Improves Conversational Flow

#AI #ChatGPT #VoiceAI #OpenAI #AIAssistants #Chatbots #SpeechAI #GenAI

1 0 0 0

AiBusinessList

@aibusinesslist.bsky.social

1 year ago

AssemblyAi Reviews & Ratings - Best AI Text Generators Explore reviews, news, and ratings on AssemblyAi. Discover the best AI Text Generators and AI tools to enhance your business performance.

We’ve been exploring AssemblyAI, and it’s a game-changer for voice data! 🗣️ From transcribing calls to analyzing sentiment, its accuracy and versatility have impressed us.

www.aibusinesslist.com/ai-text-gene...

#SpeechAI #AIForBusiness #TechInnovation #VoiceRecognition

0 0 0 0

Slator- Language Industry Intelligence

@slator.bsky.social

1 year ago

RWS and BLEND Appoint New CEOs, Microsoft and Amazon Speech AI New CEOs at RWS, Sorenson, BLEND, and Gridly, Teleperformance acquires ZP, Writer.com raises USD 200m, and new AI capabilities from Amazon and Microsoft.

RWS & BLEND Appoint New CEOs as Microsoft and Amazon Push Speech AI Boundaries

Explore the leadership shake-ups at RWS and BLEND, as well as the groundbreaking advancements in speech AI from Microsoft and Amazon.

#rws #blend #speechai #amazonai #languageservices #slatornews #slatorpod #slatorcon

1 0 0 0

Jeff Smith

@jeffsmith.tech

1 year ago

Multimodal content understanding is going the same way, starting (as always) with #CV and #NLP but rapidly expanding to other modalities like #speechAI.

0 0 1 0

Posts tagged #speechai