Research Scientist Gradudate - Foundation Model Speech & Audio Generation

Research Scientist Gradudate – Foundation Model Speech & Audio Generation

PhD graduate with a background in computer science, machine learning, or similar fields.
Good knowledge of theoretical and empirical research in addressing research problems
Solid knowledge and experience with at least one popular deep learning framework (e.g., PyTorch, TensorFlow) and familiarity with deep neural network architectures
Experience in both neural and non-neural, classical machine learning models and algorithms
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment.

Contribute cutting-edge research to ByteDance product evolution (e.g., TikTok, Douyin, CapCut) to impact billions of users worldwide.
Work on advanced science and technology in audio processing and generation (e.g., Speech Synthesis, Voice Conversion, Audio Codec Learning, Audio Language Modeling, etc.)
Research, model, design, develop and evaluate novel machine learning models and algorithms.
Collaborate with globally based researchers and engineering teams in developing machine learning models and algorithms.

Good presentation and communication skills
Research experience in one or more of the following fields: speech synthesis, audio generation, large language model, computer vision, generative models
Strong first-author publications record in top AI conferences or journals(e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL etc.)
Proficient in C / C + +, Python, and shell programming languages, and have a deep understanding of data structure and algorithm design.
Internship experience in an AI research organization