@alphacep I had pretty much the same intuition as well. I think FAIR's approach with the Omnilingual models lead to great performance for REALLY low-resource languages, but not so much for pushing performance for high-resource and even mid-resource languages.
@unilightwf What does this mean? So they tested to see if evaluators would mistake the human voice as synthetically generated? And out of all the tests, only 78.33% of the human voices in LJSpeech were identified as real voices?
🎵 Introducing InspireMusic – an open-source music generation toolkit from Tongyi Lab, designed as an all-in-one AIGC toolkit for music, song, and audio creation.
Whether you're a researcher, developer, or music enthusiast, InspireMusic has something for you:
For researchers and developers: Train and fine-tune music/song/audio generation models with ease, optimizing the creative output.
For music lovers: An intuitive tool to create music, songs, or audio content using text descriptions or audio prompts.
🚀 What's special about InspireMusic?
·Unified Audio Generation Framework: Powered by advanced generative model technology, InspireMusic supports music, song, and audio generation, offering diverse possibilities.
·Flexible and Controllable Output: Generate music with precise style and structure by using text prompts and musical feature descriptions.
·Simple and User-Friendly: Streamlined tools for model fine-tuning and inference, ensuring efficient training and improvements.
✨ Try it out now!
🎵 GitHub Repository: github.com/FunAudioLLM/In…🎶 Online Experience:
🤗HuggingFace Spaces: huggingface.co/spaces/FunAudi…
♪ Demo Page: iris2c.github.io/InspireMusic
Start creating your own musical masterpiece today! 🎶
@mhnt1580 Awesome! I am still wrapping my head around the "content usefulness" axis though. For sound especially, would it be right to say if there are enough to sounds to form a scene, it would be a useful content?
@alphacep I'm wondering, given how performant Whisper is, are there still substantial benefits to pre-train and finetune your own self-supervised model, or would we get better results just from finetuning Whisper?
Introducing Eagle-7B
Based on the RWKV-v5 architecture, bringing into opensource space, the strongest
- multi-lingual model
(beating even mistral)
- attention-free transformer today
(10-100x+ lower inference)
With comparable English performance with the best 1T 7B models
We are organizing The Speaker and Language Recognition Workshop (Odyssey) 2024, which will be held in Canada. The theme, "No Speaker Left Behind", underscores our commitment to overcoming disparities that affect individuals with diverse accents, backgrounds, or speech variations.
Excited to share you can now finetune over 1100+ TTS models thanks to @AIatMeta's MMS and the library shared below!
In my experiments, you can get an excellent finetuned version of every MMS checkpoint takes just 20 minutes, with as few as 80 to 150 samples, across all models.
A week ago one of our customers handed us 1000 pages of this (10,000 more to come), and asked us for RAG solution.
We said yes - because we said yes before we saw the document. But we've solved it - and there's a chance it's a strong improvement on all RAG SoTA.
181K Followers 136 FollowingBuilt by Moonshot AI to empower everyone to be superhuman.
⚡️API: https://t.co/XCrgjXAqMw
@KimiProduct where we share cool use cases.
@Kimidevs built for developers
151K Followers 7K FollowingCompiling in real-time, the race towards AGI.
The Largest Show on X for AI.
🗞️ Get my daily AI analysis newsletter to your email 👉 https://t.co/6LBxO8215l
122K Followers 3K FollowingDream realized! Turned my love for AI into a career - sharing daily. Get my newsletter (225k+ subs): 🔗 https://t.co/jHMmImnfVg //📧 [email protected]
1.2M Followers 787 FollowingProfessor at NYU & Executive Chairman at AMI Labs.
Ex-Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
1K Followers 659 Following名古屋大学情報学研究科助教. Assistant professor, Nagoya University. Speech synthesis & evaluation. Trilingual, street dancer, golfer. Tweets are my own opinions.
742 Followers 2 FollowingNew Audio and Speech Processing papers from https://t.co/mvy2Lc7qxl: processing signals representing audio. Thank you to arXiv for use of its open access interoperability.
2K Followers 88 FollowingAs part of Alibaba's Tongyi Lab, we focus on multimodal speech and language models like FunAudioLLM, FunASR, and CosyVoice. Explore our 200+ open-source models!
291K Followers 569 FollowingCo-Founder of ByteByteGo | Author of the bestselling book series: ‘System Design Interview’ | YouTube: https://t.co/9gPSJSrtPU
263K Followers 182 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log
5K Followers 21 FollowingHung-yi Lee is currently a professor at National Taiwan University. He owns a YouTube channel teaching deep learning in Mandarin.
5K Followers 1 FollowingIEEE International Conference on Acoustics, Speech, and Signal Processing. #ICASSP2026 will be held 4-8 May 2026 in Barcelona, Spain.
991 Followers 2K FollowingAI Research / Founder @ Red Dragon AI.
Co-organiser of Machine Learning Singapore MeetUp. @GoogleDevExpert (ML).
Fixed Income quant in NYC during AI winter
9K Followers 2K FollowingAssoc. Professor @UCBerkeley Interpretable AI and language @Interpret_AI PI @BerkeleySCLab Linguistics Lead @ProjectCETI🐳 College Principal of Bowles🏰