- Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech(2021)
- Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning (ICASSP 2022)
- One TTS Alignment To Rule Them All (ICASSP 2022)
- RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis (2021)
- CLAP: Learning Audio Concepts From Natural Language Supervision (2022)
- Towards Learning Universal Aaudio Representations (ICASSP 2022)
- Language Modeling via Stochastic Processes (ICLR 2022)
- Exploring the Limits of Large Scale Pre-training(2021)
- Deep Learning Scaling is Predictable, Empirically (2017)
- On the duality between contrastive and non-contrastive self-supervised learning(2022)
- Learning Transferable Visual Models From Natural Language Supervision (2021) [repo]
- Representation Learning with Contrastive Predictive Coding (2018)
- Learning Representations by Maximizing Mutual Information Across Views (2019)
- SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization (ICML 2022)
- Towards a General Purpose CNN for Long Range Dependencies in ND (ICML 2022)