Member of Technical Staff - Image Generation

Member of Technical Staff – Image Generation

Company	Captions
Location	New York, NY, USA
Salary	$175000 – $275000
Type	Full-Time
Degrees	Master’s, PhD
Experience Level	Senior, Expert or higher

Requirements

Master’s or PhD in Computer Science, Machine Learning, or a related field or equivalent practical experience.
Demonstrated experience implementing and improving state‑of‑the‑art generative image models.
Deep expertise in generative modeling approaches (flow matching / diffusion, autoregressive models, VAEs, GANs, etc.).
Strong background in optimization techniques, sampling, and loss‑function design.
Experience with empirical scaling studies and systematic architecture research.
Track record of research contributions at top ML conferences (NeurIPS, CVPR, ICCV, ICML, ICLR).
Strong proficiency in modern deep‑learning tooling (PyTorch, CUDA, Triton, FSDP, etc.).
Experience training image diffusion models with billions of parameters.
Deep understanding of attention, transformers, latent representations, and modern image‑text alignment techniques.
Expertise in distributed training systems, model parallelism, and high‑throughput inference.
Proven ability to implement and improve complex model architectures end to end.
Ability to write clean, modular research code that scales from prototype to production.
Strong software‑engineering practices including testing, code review, and CI/CD.
Experience with rapid prototyping and experimental design under tight iteration loops.
Strong analytical skills for debugging model behavior, numerical stability, and performance bottlenecks.
Familiarity with profiling and optimization tools (Nsight, TensorBoard, PyTorch Profiler, etc.).
Track record of bringing research ideas to production and maintaining high code quality in a research environment.

Responsibilities

Design and implement large‑scale image generation models (transformers, latent diffusion, flow matching, etc.).
Develop new approaches to multimodal conditioning and generation (e.g. audio and video) and controllability (editing, multi-frame consistency, script guidance, etc).
Research advanced image‑editing and -generation techniques such as content‑preserving edits, multi‑input conditioning, and reference‑based generation.
Establish and validate scaling laws for image diffusion models across resolution and parameter count.
Develop automated evaluation approaches for improved fidelity and consistency.
Drive rapid experimentation with model architectures, sampling strategies, and training strategies.
Validate research directly through product deployment and real user feedback.
Derive insights from data and recommend architectures and training practices that will make meaningful impacts on our products.
Train and optimize models at massive scale (10s–100s of billions of parameters) across multi‑node GPU clusters.
Push the boundaries of efficiency and hardware utilization for training and deploying models in a cost effective manner.
Develop sophisticated distributed training approaches using FSDP, DeepSpeed, Megatron‑LM, Triton and custom CUDA kernels where needed.
Design and implement model‑compression techniques (pruning, distillation, quantization, etc.) for efficient serving.
Create new approaches to memory optimization, gradient checkpointing, and mixed‑precision training.
Research techniques for improving sampling speed (DDIM, PFGM++, SDE‑VE) and training stability at scale.
Conduct systematic empirical studies to benchmark architecture and optimization choices.

Preferred Qualifications

Familiarity with large language models or multimodal transformers is a plus.