Research Scientist - Foundation Model - Speech & Audio

Research Scientist – Foundation Model – Speech & Audio

M.S. or Ph.D. in computer science, machine learning, or similar fields.
At least three years of relevant industry or research experience
Good knowledge of theoretical and empirical research in addressing research problems
Solid knowledge and experience with at least one popular deep learning framework (e.g., PyTorch, TensorFlow) and familiarity with deep neural network architectures
Good presentation and communication skills
Experience in both neural and non-neural, classical machine learning models and algorithms

Contribute cutting-edge research to ByteDance product evolution (e.g., TikTok, CapCut) to impact billions of users worldwide.
Lead research to advance science and technology in audio processing and generation (e.g., Speech Synthesis, Voice Conversion, Audio Codec Learning, Audio Language Modeling, etc.)
Research, model, design, develop and evaluate novel machine learning models and algorithms.
Collaborate with globally based researchers and engineering teams in developing machine learning models and algorithms.

Expertise in one or more of the following fields: speech synthesis or recognition, natural language processing, computer vision, generative models
Strong first-author publications record in top AI conferences or journals(e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, ICASSP)
Proficient in C / C + +, Python, and shell programming languages, and have a deep understanding of data structure and algorithm design.
Work or internship experience in an AI research organization