Skip to content

Member of Technical Staff – Image Generation
Company | Captions |
---|
Location | New York, NY, USA |
---|
Salary | $175000 – $275000 |
---|
Type | Full-Time |
---|
Degrees | Master’s, PhD |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Master’s or PhD in Computer Science, Machine Learning, or a related field or equivalent practical experience.
- Demonstrated experience implementing and improving state‑of‑the‑art generative image models.
- Deep expertise in generative modeling approaches (flow matching / diffusion, autoregressive models, VAEs, GANs, etc.).
- Strong background in optimization techniques, sampling, and loss‑function design.
- Experience with empirical scaling studies and systematic architecture research.
- Track record of research contributions at top ML conferences (NeurIPS, CVPR, ICCV, ICML, ICLR).
- Strong proficiency in modern deep‑learning tooling (PyTorch, CUDA, Triton, FSDP, etc.).
- Experience training image diffusion models with billions of parameters.
- Deep understanding of attention, transformers, latent representations, and modern image‑text alignment techniques.
- Expertise in distributed training systems, model parallelism, and high‑throughput inference.
- Proven ability to implement and improve complex model architectures end to end.
- Ability to write clean, modular research code that scales from prototype to production.
- Strong software‑engineering practices including testing, code review, and CI/CD.
- Experience with rapid prototyping and experimental design under tight iteration loops.
- Strong analytical skills for debugging model behavior, numerical stability, and performance bottlenecks.
- Familiarity with profiling and optimization tools (Nsight, TensorBoard, PyTorch Profiler, etc.).
- Track record of bringing research ideas to production and maintaining high code quality in a research environment.
Responsibilities
- Design and implement large‑scale image generation models (transformers, latent diffusion, flow matching, etc.).
- Develop new approaches to multimodal conditioning and generation (e.g. audio and video) and controllability (editing, multi-frame consistency, script guidance, etc).
- Research advanced image‑editing and -generation techniques such as content‑preserving edits, multi‑input conditioning, and reference‑based generation.
- Establish and validate scaling laws for image diffusion models across resolution and parameter count.
- Develop automated evaluation approaches for improved fidelity and consistency.
- Drive rapid experimentation with model architectures, sampling strategies, and training strategies.
- Validate research directly through product deployment and real user feedback.
- Derive insights from data and recommend architectures and training practices that will make meaningful impacts on our products.
- Train and optimize models at massive scale (10s–100s of billions of parameters) across multi‑node GPU clusters.
- Push the boundaries of efficiency and hardware utilization for training and deploying models in a cost effective manner.
- Develop sophisticated distributed training approaches using FSDP, DeepSpeed, Megatron‑LM, Triton and custom CUDA kernels where needed.
- Design and implement model‑compression techniques (pruning, distillation, quantization, etc.) for efficient serving.
- Create new approaches to memory optimization, gradient checkpointing, and mixed‑precision training.
- Research techniques for improving sampling speed (DDIM, PFGM++, SDE‑VE) and training stability at scale.
- Conduct systematic empirical studies to benchmark architecture and optimization choices.
Preferred Qualifications
- Familiarity with large language models or multimodal transformers is a plus.