WAVLab | @CarnegieMellon @WavLab

Shinji Watanabe's Audio and Voice Lab | WAVLab @LTIatCMU @SCSatCMU | Speech Recognition, Speech Enhancement, Spoken Language Understanding, and more. wavlab.org Joined August 2021

Tweets

322
Followers

2K
Following

145
Likes

339

Masao @mmiagshatoy

4 days ago

Excited to share that our work on ESPnet3 has been accepted to Interspeech 2026! We’ll be releasing it soon, stay tuned!

arXiv Sound @ArxivSound

6 days ago

Masao Someki, et al., "ESPnet3: Infrastructure for Scalable Speech and Audio Research in the Foundation Model Era" arxiv.org/abs/2606.21854

0 3 9 2K 4

0 8 22 2K 1

View Details

arXiv Sound @ArxivSound

a month ago

Masao, Someki, et al., "PlanRAG-Audio: Planning and Retrieval Augmented Generation for Long-form Audio Understanding,", arxiv.org/abs/2605.20414

0 2 11 923 7

View Details

Shinji Watanabe @shinjiw_at_cmu

2 months ago

We are looking for a postdoctoral researcher in speech and audio processing, with a possible start in the Fall 2026 semester. If you are interested in working with us, please apply through the following form: forms.gle/gfENMMrRf1nmnT…

1 24 58 9K 8

View Details

William Chen @chenwanch1

2 months ago

Accepted to ICML! See y’all in Korea 🇰🇷

William Chen @chenwanch1

4 months ago

What if you had nano-banana for audio? AudioChat is a multi-modal LM that performs fine-grained understanding, generation, and editing of multi-source scenes By diffusing continuous latents, it generates 48khz stereo edits with great input adherence: wanchichen.github.io/audiochat/

6 24 166 14K 145

0 2 27 3K 6

View Details

WAVLab | @CarnegieMellon @WavLab

2 months ago

7. Phonological Tokenizer: Prosody-Aware Phonetic Token via Multi-Objective Fine-Tuning With Differentiable K-Means Poster: May 6, 14:00 arxiv.org/abs/2601.19781 8. Online Register for Dual-Mode Self-Supervised Speech Models Poster: May 7, 09:00 arxiv.org/abs/2602.23702 5/5

0 0 2 195 0

View Details

WAVLab | @CarnegieMellon @WavLab

2 months ago

WAVLab @ #ICASSP2026 We will present 8 papers at ICASSP in Barcelona. If you are attending, please stop by the talks/posters and chat with the authors. arXiv links and presentation info below. 1/5

4 3 23 2K 1

View Details

WAVLab | @CarnegieMellon @WavLab

2 months ago

5. Full-Duplex-Bench V1.5: Evaluating Overlap Handling for Full-Duplex Speech Models Poster: May 8, 14:00 arxiv.org/abs/2507.23159 6. CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR Oral: May 8, 15:00 arxiv.org/abs/2601.22792 4/5

0 0 1 144 0

View Details

WAVLab | @CarnegieMellon @WavLab

2 months ago

3. Reasoning Beyond Majority Vote: An Explainable SpeechLM Framework for Speech Emotion Recognition Oral: May 7, 15:00 arxiv.org/abs/2509.24187 4. 2025 URGENT Speech Enhancement Challenge Multilingual P.808 Listening Tests Oral: May 6, 17:50 arxiv.org/abs/2507.11306 3/5

0 0 0 152 0

View Details

WAVLab | @CarnegieMellon @WavLab

2 months ago

1. ICASSP 2026 URGENT Speech Enhancement Challenge Poster: Fri May 8, 14:00 to 16:00, Poster Area 43 arxiv.org/abs/2601.13531 2. SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition Oral: Fri May 8, 10:00 to 10:20 arxiv.org/abs/2601.12600 2/5

0 0 0 297 0

View Details

WAVLab | @CarnegieMellon @WavLab

2 months ago

Congrats to Brian @brianyan918 on finishing his PhD defense today! It was great to see so many people show up for this big event and celebrate such an important milestone. Wishing you all the best in what comes next!

0 1 18 951 0

View Details

Shinji Watanabe @shinjiw_at_cmu

3 months ago

6 papers (4 main and 2 findings) were accepted at #ACL2026! All are speech papers :)

1 10 98 5K 8

View Details

arXiv Sound @ArxivSound

3 months ago

Shikhar Bharadwaj, Chin-Jou Li, Kwanghee Choi, Eunjung Yeo, William Chen, Shinji Watanabe, David R. Mortensen, "An Empirical Recipe for Universal Phone Recognition," arxiv.org/abs/2603.29042

0 6 14 3K 5

View Details

WAVLab | @CarnegieMellon @WavLab

3 months ago

Congratulations to Li-Wei @liweiche77 on successfully defending his PhD today! 🎉 Wishing him all the best in his next chapter!

0 4 20 1K 0

View Details

WAVLab | @CarnegieMellon @WavLab

4 months ago

Congratulations to Siddhant @Sid_Arora_18 on a successful PhD defense today! It was wonderful to celebrate this big milestone together. Wishing him all the best for the exciting journey ahead.

4 5 54 4K 2

View Details

Natural Language Processing Papers @HEI

5 months ago

PRiSM: Benchmarking Phone Realization in Speech Models Shikhar Bharadwaj, Chin-Jou Li, Yoonjae Kim, Kwanghee Choi, Eunjung Yeo, Ryan Soh-Eun Shim, Hanyu Zhou, Brendon Boldt, Karen Rosero Jacome, Kalvin Chang, Darsh Agrawal, … arxiv.org/abs/2601.14046 [𝚌𝚜.𝙲𝙻 𝚌𝚜.𝚂𝙳]

0 4 6 467 1

View Details

arXiv Sound @ArxivSound

5 months ago

Chenda Li, Wei Wang, Marvin Sach, Wangyou Zhang, Kohei Saijo, Samuele Cornell, Yihui Fu, Zhaoheng Ni, Tim Fingscheidt, Shinji Watanabe, Yanmin Qian, "ICASSP 2026 URGENT Speech Enhancement Challenge," arxiv.org/abs/2601.13531

0 3 12 853 4

View Details

arXiv Sound @ArxivSound

5 months ago

Pu Wang, Shinji Watanabe, Hugo Van hamme, "SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition," arxiv.org/abs/2601.12600

0 2 5 414 3

View Details

arXiv Sound @ArxivSound

5 months ago

Shih-Heng Wang, Jiatong Shi, Jinchuan Tian, Haibin Wu, Shinji Watanabe, "Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks," arxiv.org/abs/2601.12205

0 3 16 855 8

View Details

jiatongshi @jiatongshi

7 months ago

Heading to NeurIPS 2025 in San Diego! I’ll present our spotlight poster, ARECHO, focusing on speech multi-metric estimation. 📍 Exhibit Hall C,D,E #2000 🗓️ Thu Dec 4, 11 a.m.–2 p.m. PST If you’re around, let’s say hi or grab a coffee!

1 3 24 1K 1

View Details

jiatongshi @jiatongshi

7 months ago

This is exactly the reason we worked for ESPnet-Codec, but being really hard to keep tracking as people are fast nowadays. The similar issue happens at most speech tasks from ASR, TTS, to general speech LLM. It's a bit sad time for driving scientific findings 🥲