WAVLab | @CarnegieMellon @WavLab
Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more. wavlab.org Joined August 2021-
Tweets322
-
Followers2K
-
Following145
-
Likes339
Excited to share that our work on ESPnet3 has been accepted to Interspeech 2026! We’ll be releasing it soon, stay tuned!
Masao Someki, et al., "ESPnet3: Infrastructure for Scalable Speech and Audio Research in the Foundation Model Era" arxiv.org/abs/2606.21854
Masao, Someki, et al., "PlanRAG-Audio: Planning and Retrieval Augmented Generation for Long-form Audio Understanding,", arxiv.org/abs/2605.20414
We are looking for a postdoctoral researcher in speech and audio processing, with a possible start in the Fall 2026 semester. If you are interested in working with us, please apply through the following form: forms.gle/gfENMMrRf1nmnT…
Accepted to ICML! See y’all in Korea 🇰🇷
What if you had nano-banana for audio? AudioChat is a multi-modal LM that performs fine-grained understanding, generation, and editing of multi-source scenes By diffusing continuous latents, it generates 48khz stereo edits with great input adherence: wanchichen.github.io/audiochat/
7. Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning With Differentiable K-Means Poster: May 6, 14:00 arxiv.org/abs/2601.19781 8. Online Register for Dual-Mode Self-Supervised Speech Models Poster: May 7, 09:00 arxiv.org/abs/2602.23702 5/5
WAVLab @ #ICASSP2026 We will present 8 papers at ICASSP in Barcelona. If you are attending, please stop by the talks/posters and chat with the authors. arXiv links and presentation info below. 1/5
5. Full-Duplex-Bench V1.5: Evaluating Overlap Handling for Full-Duplex Speech Models Poster: May 8, 14:00 arxiv.org/abs/2507.23159 6. CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR Oral: May 8, 15:00 arxiv.org/abs/2601.22792 4/5
3. Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition Oral: May 7, 15:00 arxiv.org/abs/2509.24187 4. 2025 URGENT Speech Enhancement Challenge Multilingual P.808 Listening Tests Oral: May 6, 17:50 arxiv.org/abs/2507.11306 3/5
1. ICASSP 2026 URGENT Speech Enhancement Challenge Poster: Fri May 8, 14:00 to 16:00, Poster Area 43 arxiv.org/abs/2601.13531 2. SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition Oral: Fri May 8, 10:00 to 10:20 arxiv.org/abs/2601.12600 2/5
Congrats to Brian @brianyan918 on finishing his PhD defense today! It was great to see so many people show up for this big event and celebrate such an important milestone. Wishing you all the best in what comes next!
6 papers (4 main and 2 findings) were accepted at #ACL2026! All are speech papers :)
Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi, Eunjung Yeo, William Chen, Shinji Watanabe, David R. Mortensen, "An Empirical Recipe for Universal Phone Recognition," arxiv.org/abs/2603.29042
Congratulations to Li-Wei @liweiche77 on successfully defending his PhD today! 🎉 Wishing him all the best in his next chapter!
Congratulations to Siddhant @Sid_Arora_18 on a successful PhD defense today! It was wonderful to celebrate this big milestone together. Wishing him all the best for the exciting journey ahead.
PRiSM: Benchmarking Phone Realization in Speech Models Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, … arxiv.org/abs/2601.14046 [𝚌𝚜.𝙲𝙻 𝚌𝚜.𝚂𝙳]
Chenda Li, Wei Wang, Marvin Sach, Wangyou Zhang, Kohei Saijo, Samuele Cornell, Yihui Fu, Zhaoheng Ni, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian, "ICASSP 2026 URGENT Speech Enhancement Challenge," arxiv.org/abs/2601.13531
Pu Wang, Shinji Watanabe, Hugo Van hamme, "SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition," arxiv.org/abs/2601.12600
Shih-Heng Wang, Jiatong Shi, Jinchuan Tian, Haibin Wu, Shinji Watanabe, "Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks," arxiv.org/abs/2601.12205
Heading to NeurIPS 2025 in San Diego! I’ll present our spotlight poster, ARECHO, focusing on speech multi-metric estimation. 📍 Exhibit Hall C,D,E #2000 🗓️ Thu Dec 4, 11 a.m.–2 p.m. PST If you’re around, let’s say hi or grab a coffee!
This is exactly the reason we worked for ESPnet-Codec, but being really hard to keep tracking as people are fast nowadays. The similar issue happens at most speech tasks from ASR, TTS, to general speech LLM. It's a bit sad time for driving scientific findings 🥲
ヌラールオヂオーコデクの論文、全く違うデータで学習されたモデルを比較して「ワイらのモデル最強や!!😤😤😤」と主張しているものばかりで😩😩😩😩😩😩😩😩😩😩😩に関するMOS値が1000000になった
Shinji Watanabe @shinjiw_at_cmu
5K Followers 370 Following I'm working at CMU (2021-). I was working at NTT (2001-2011), MERL (2012-2017), and JHU (2017-2020). Speech and Audio Processing is my main research topic.
Desh Raj @rdesh26
4K Followers 2K Following Speech + LLMs @nvidia | Previously: @Meta MSL, @jhuclsp, @IITGuwahati
arXiv Sound @ArxivSound
7K Followers 38 Following Sound-related articles (https://t.co/dxVYgWJGOw and https://t.co/b90N0Zzvjs) on https://t.co/HHqPequzVU
Jonathan Le Roux @JonathanLeRoux
2K Followers 309 Following Speech and audio research scientist at MERL. Opinions never really my own. 🦋https://t.co/6pSuhzw3fb
まっすー @ymas0315
2K Followers 2K Following
Siddharth Dalmia @siddalmia05
2K Followers 450 Following Voice AI @Meta | #SpeechProc and #NLProc | Previously @WaveformsAI @GoogleDeepmind | PhD @LTIatCMU @SCSatCMU
Mirco Ravanelli @mirco_ravanelli
4K Followers 2K Following Deep learning for Conversational AI. Creator of SpeechBrain.
Robin Scheibler @fakufakurevenge
888 Followers 941 Following Grower of cucumbers 🥒, tomatoes 🍅, and chilli peppers 🌶️. I ❤ audio, microphone arrays, IoT, Python, and data.
Delip Rao e/σ @deliprao
69K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
laurent besacier @laurent_besacie
2K Followers 804 Following Principal Scientist at Naver Labs Europe & Professor at Univ. Grenoble Alpes - now on bluesky @lbesacie.bsky.social
Graham Neubig @gneubig
44K Followers 780 Following Associate professor @LTIatCMU. Co-founder/chief scientist @OpenHandsDev. I mostly work on modeling language.
Samuele Cornell @SamueleCornell
980 Followers 524 Following Post-doc @ CMU LTI. Audio and speech researcher.
Wen-Chin Huang @unilightwf
1K Followers 659 Following 名古屋大学情報学研究科助教. Assistant professor, Nagoya University. Speech synthesis & evaluation. Trilingual, street dancer, golfer. Tweets are my own opinions.
yamakatz @kyama0321
1K Followers 1K Following 🐻🐼👨🏻🎓🧑🏻💻🎧🦻🚘🟢 Research Scientist 専門は聴覚や補聴技術など音響学全般。人間の感覚を補助・拡張する数理・技術・装置・環境の未来に興味あり。
Christian Steinmetz @csteinmetz1
6K Followers 2K Following Research Scientist @ Suno // working on generative music • audio fidelity • signal processing • ML
zhuo @zhuo316200
1 Followers 16 Following
Aparo Gauss @AparoGauss
5 Followers 63 Following
yasi @im_yasamiiin
0 Followers 40 Following
Sishir Kalita @SishirKalita
0 Followers 77 Following
Rajkumar rawal @raj_kumar_rawal
91 Followers 2K Following Building @SetuMind_ai |🌱 Life long learner...| Generative AI Engineer| Founder/CEO @TechParivartan|Everything started from "0", in hope to reach "1" someday|🚀
Hrishikesh D @rheeshee
1 Followers 99 Following the only difference there will be left between ai and humans is that we dream!
Sashank Macha @SashankMacha
2 Followers 59 Following
Jikanle - 時間乐 @JikanleOficial
2 Followers 152 Following Time that counts twice — where you learn a language, find your people, and let music do the bonding.
loopzy @yyt1246
0 Followers 26 Following
Xie Zhifei @XieZhifei14110
584 Followers 217 Following PhD @NUS & NTU, advised by Prof. Shuicheng Yan / Mini-Omni series (1st real-time speech LLM) · Audio-Reasoner · Mega-ASR / [email protected] Proactive AI Intelligence
Willow @Willowsnz
1 Followers 119 Following
JUNCHUAN ZHAO @Junchuan_0803
4 Followers 37 Following Hi! This is Junchuan, a PhD student at NUS working on speech & multimodal AI. Feel free to visit my website, and I’m always open to collaborations! 😁🤗
Jay Jain @JayLjain
166 Followers 5K Following Lead Backend Developer at Frnd , Interested in Backend, system design , Ai, music , movies , travelling , and understanding how things work .
Tuan A Dinh @TuanADinh2
1 Followers 50 Following
- @desfantasiado
0 Followers 44 Following
J.A.R.V.I.S @YangYang16889
27 Followers 2K Following
sumset @sumir_30
25 Followers 641 Following dev. engg. | ai @puch_ai | join @marinlabshq | prev @ai_pixa
just @Bianniao123
44 Followers 3K Following
Alexander Polok @alexander_polok
1 Followers 7 Following
Habtamu Asefa @habt_asefa
12 Followers 698 Following Building TTS and ASR models for low resource languages. First-principles ML/DL engineer exploring native audio, and multimodal models.
yami dummy @DummyYami
1 Followers 57 Following
Mimimomo @Mayumi_fox
122 Followers 2K Following
Harper Wang @HarperWang1212
3 Followers 125 Following
A @agarruloushanar
0 Followers 88 Following
jingyic @jingyic01
5 Followers 38 Following
DWang @DWang18241597
1 Followers 65 Following
Francesco @fra_bonzi
60 Followers 229 Following PhD Student at Mila Quebec AI Institute | Concordia University, Montreal, Canada
Wy W @DuoShiWang
3 Followers 99 Following
Kory Mathewson @korymath
12K Followers 6K Following @GoogleDeepMind generative AI models + agents | getting great tech into the hands of great creative people
Aayush Sharma @Aayush9753
516 Followers 2K Following Making machines talk | AI/ML Researcher @SarvamAI | IITG
naoto nishida @Cvadogsan
1K Followers 3K Following UTokyo GSII PhD/currently visiting scholar at https://t.co/FR5vOb2NCP /@ishi96 lab/ex. @rkmt lab/HCI, NLP/language, movie, jiujitsu, climbing ENG→@nawta_hci
Ani Trghwi @logicrxt
106 Followers 4K Following
Yudong Yang @YudongY12126
0 Followers 3 Following
Tino A @Tinolvarez
519 Followers 3K Following Computer Vision Engineer and PostDoc Researcher at @UniOulu | Live Engineer | Musician. I love photography and painting. Galician (Spain) living in Finland.
Fingvo @fingvo
64 Followers 2K Following
Kate @0411Kate
4 Followers 45 Following
Son Nguyen @NgocNguyen91784
2 Followers 42 Following AI Research Resident at @fsoft_aicenter | Multimodal Learning, Generative Models, Audio-Visual Learning.
Shobhit Banga @shobhitbanga
3K Followers 508 Following Scorekeeper for voice AI. Building independent, real-world evals that help enterprises make buying decisions. Co-founder @voicearena_ai | prev @joshtalkslive
andrelucas @aaandrelucas
10 Followers 597 Following
T @telugupbl
15 Followers 1K Following
Asım Us @AsmUs855740
0 Followers 64 Following
Shinji Watanabe @shinjiw_at_cmu
5K Followers 370 Following I'm working at CMU (2021-). I was working at NTT (2001-2011), MERL (2012-2017), and JHU (2017-2020). Speech and Audio Processing is my main research topic.
AK @_akhaliq
508K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo ,submit papers here: https://t.co/UzmYN5XOCi
Desh Raj @rdesh26
4K Followers 2K Following Speech + LLMs @nvidia | Previously: @Meta MSL, @jhuclsp, @IITGuwahati
arXiv Sound @ArxivSound
7K Followers 38 Following Sound-related articles (https://t.co/dxVYgWJGOw and https://t.co/b90N0Zzvjs) on https://t.co/HHqPequzVU
Jonathan Le Roux @JonathanLeRoux
2K Followers 309 Following Speech and audio research scientist at MERL. Opinions never really my own. 🦋https://t.co/6pSuhzw3fb
AI at Meta @AIatMeta
814K Followers 324 Following Together with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.
Heiga Zen (全 炳河... @heiga_zen
11K Followers 155 Following Principal Scientist (Director) @GoogleDeepMind / GDM東京拠点リード.波瀬小⇒一志中⇒鈴鹿高専⇒名工大 (1年間🇺🇸IBMワトソン研インターン)⇒🇬🇧東芝欧州研⇒Google (🇬🇧Speech⇒🇯🇵Brain) ⇒🇯🇵GoogleDeepMind
Shinnosuke Takamichi ... @forthshinji
5K Followers 395 Following Speech researcher / 音声研究者. https://t.co/f8hJL8R1Lm
まっすー @ymas0315
2K Followers 2K Following
PyTorch @PyTorch
499K Followers 87 Following Tensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation
Wei-Ning Hsu @mhnt1580
2K Followers 145 Following Research Scientist @ Meta FAIR / audio generation, self-supervised learning, speech processing
Siddharth Dalmia @siddalmia05
2K Followers 450 Following Voice AI @Meta | #SpeechProc and #NLProc | Previously @WaveformsAI @GoogleDeepmind | PhD @LTIatCMU @SCSatCMU
Alexis Conneau @alex_conneau
34K Followers 206 Following Co-founder and CEO https://t.co/efv72CKpAG (@WaveFormsAI) - Ex @OpenAI GPT-4o/AVM Audio Research Lead - #Her #TARS - Ex @AIatMeta, @Polytechnique (X11)
Graham Neubig @gneubig
44K Followers 780 Following Associate professor @LTIatCMU. Co-founder/chief scientist @OpenHandsDev. I mostly work on modeling language.
Samuele Cornell @SamueleCornell
980 Followers 524 Following Post-doc @ CMU LTI. Audio and speech researcher.
Wen-Chin Huang @unilightwf
1K Followers 659 Following 名古屋大学情報学研究科助教. Assistant professor, Nagoya University. Speech synthesis & evaluation. Trilingual, street dancer, golfer. Tweets are my own opinions.
Chien-yu Huang @cyhuang_tw
96 Followers 85 Following Ph.D. Student @ Carnegie Mellon University speech and language
Yen-Ju Lu @Yen_Ju_Lu
19 Followers 47 Following Research Scientist Intern @ Meta | Prev. Research Intern @ Apple | PhD Candidate (Speech & NLP, Multimodal LLMs)
Albert Gu @_albertgu
21K Followers 77 Following assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.
Ye Jia @jiayephy
417 Followers 642 Following @OpenAI. Prev: @Google Brain; @Meta Llama; co-founded https://t.co/Ao04duAZgQ.
Shikhar @ShikharSSU
303 Followers 1K Following Turning noise into…slightly better noise. https://t.co/9gtrEjheT0
Berkeley Biological &... @BerkeleySCLab
1K Followers 508 Following Lab @UCBerkeley for biological and artificial language. PI @begusgasper
Lior Alexander @LiorOnAI
116K Followers 2K Following Founder @AlphaSignalAI → the Intelligence layer of AI (300k users) • MIT Lecturer • ex-MILA researcher • In ML since GANs
Felix @felix_red_panda
5K Followers 3K Following speech synthesis and LLM nerd, DMs open, working on LLM stuff
Chris Donahue @chrisdonahuey
6K Followers 1K Following GenAI for *human* creativity in music + more. Assistant prof at CMU CSD, 🎼 G-CLef lab. Part time Google DeepMind, Magenta (views my own)
Rafael Valle @RafaelValleArt
1K Followers 189 Following Research Manager and Scientist at NVIDIA. UC Berkeley alumn. Love, music, set and setting!
Jungo Kasai (Kotoba) @jungokasai
2K Followers 668 Following Co-founder & CTO @kotoba_tech | PhD from @nlpnoah at @UW | IBM PhD Fellow | 孫正義育英財団生 | @Yale Undergraduate
Ankit Shah @ankits0052
2K Followers 8K Following Full Stack LLM Associate Director. Ph.D. @LTIatCMU. Sharing insights about AI research, LLMs, multimodal AI, coding & tech. 🚀 Views are my own
Antonis Anastasopoulo... @anas_ant
3K Followers 2K Following Assist. Prof at George Mason CS #nlproc MT, ASR, and documentation of endangered languages.
Ganesh Kini @gkayakg
106 Followers 1K Following PhD candidate at UCSB | Interested in machine learning, deep learning, signal processing | Masters from IISc
Loren Lugosch @lorenlugosch
2K Followers 995 Following Machine learning @ ; audio & language; Freigeisterei und Vielgeisterei; "at once a man of business and a man of rhyme"
Dong Zhang @dongzha35524835
674 Followers 743 Following Principle Researcher @XiaomiMiMo | Developing MiMo-Audio, SpeechGPT-Series | SuperIntelligence for Voice Agent
Rui Liu @RuiLiu60711141
372 Followers 327 Following Professor at Inner Mongolia University. working on speech synthesis, deep learning, natural language processing.
Yu-An (Andy) Chung @iamyuanchung
183 Followers 321 Following Studying representation learning, self-supervised learning, generative modeling methods for speech and audio
lester violeta @lesterphv
237 Followers 700 Following teaching computers to speak || research scientist @dubguild || phd from @nagoyauniv
Asst. Prof. Li Sheng ... @cs_lisheng
788 Followers 8K Following ◆Faculty (Science Tokyo + Kyoto Univ.) / Guest (RIKEN) ◆Spoken Language Processing ◆Welcome collaboration, discussion CV: https://t.co/naL0tJB3sI
Fabian Ritter-Gutierr... @Fabian_acoustic
185 Followers 695 Following Chilean doing PhD on Speech in Singapore. I rarely use this social media. Active on: https://t.co/JQxD1cDaZs
JIACHEN LIAN @LianJiachen
159 Followers 136 Following EECS PhD at UCB | Berkeley Artificial Intelligence Research(BAIR) | Snooker Lover
Andrew Rouditchenko �... @arouditchenko
465 Followers 569 Following PhD student at MIT working on multi-modal and multilingual speech. I was an intern at @AIatMeta and @Apple MLR.
Ian (Yi-Jen) Shih @yijenshih
165 Followers 253 Following CS Ph.D. @UTAustin @UTCompSci @utsaltlab, ex Meta Intern, NTUEE Undergraduate Interested in Music Information Retrieval, Speech processing and Deep learning.
Phillip Rust @rust_phillip
392 Followers 602 Following Research Scientist @AIatMeta (FAIR) • PhD @coastalcph
Shih-Lun (Sean) Wu @slseanwu
304 Followers 157 Following music/audio/speech proc, generative models PhD student (now), EECS MIT MSc '24, SCS CMU BSc '21, CS Nat'l Taiwan U casual classical pianist🎹 & violist🎻
Martijn Bartelds @BarteldsMartijn
771 Followers 432 Following Researcher @togethercompute | Formerly @stanfordnlp, @univgroningen, @tudelft and @Penn
Yuki Saito @ysaito_human
812 Followers 588 Following Lecturer (Sr. Assistant Professor) @ UTokyo-SaruLab, Japan, 特定フェロー@産総研 (JST BOOST 若手研究者支援, 2025 ~ 2030), 講談社「音声変換入門 Pythonで作って学ぶボイスチェンジャー」(🦜本)
Huck Yang @huckiyang
908 Followers 806 Following Sr. Staff Research Scientist | @GeorgiaTech | Past: @NvidiaAI @GoogleAI @Amazon @Hitachi TSMC | 🗣️ omni
Kwanghee Choi @juice500ml
221 Followers 168 Following PhD student, working on speech AI with David Harwath @utsaltlab (@UTAustin) and David R. Mortensen @dmort27 (@LTIatCMU).
Sravya Popuri @sravyapopuri388
155 Followers 386 Following Tech Lead Manager for mid-training, long context and synthetic data for Llama models at Meta Gen AI
Yung-Sung Chuang @YungSungChuang
2K Followers 691 Following Research @OpenAI | PhD @MIT_CSAIL | Prev @MetaAI @Microsoft @MITIBMLab | BS @NTU_SPML in #Taiwan
dongchao @dcyang98
87 Followers 101 Following A PhD student in The Chinese University of Hong Kong, focusing on large audio foundation models.
Heng-Jui Chang @hjchang87
180 Followers 181 Following 🎓 PhD Candidate @MIT_CSAIL 🧪 Research Scientist Intern @AIatMeta
宜函 吴 @yihanwu93398629
1 Followers 3 Following
DailyAudioPapers @mlsp4audio
767 Followers 632 Following Daily tweets on selected arXiv papers on audio (eess․AS/cs․SD) | Brief reviews of interesting papers | Machine learning | Signal processing
Liu Songxiang @shaunliu231
55 Followers 113 Following Focuses on general spoken language processing, speech and singing generation. Ph.D. from CUHK @CUHKofficial























