Posted in

Member of Technical Staff – Image Generation

Member of Technical Staff – Image Generation

CompanyCaptions
LocationNew York, NY, USA
Salary$175000 – $275000
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelSenior, Expert or higher

Requirements

  • Master’s or PhD in Computer Science, Machine Learning, or a related field or equivalent practical experience.
  • Demonstrated experience implementing and improving state‑of‑the‑art generative image models.
  • Deep expertise in generative modeling approaches (flow matching / diffusion, autoregressive models, VAEs, GANs, etc.).
  • Strong background in optimization techniques, sampling, and loss‑function design.
  • Experience with empirical scaling studies and systematic architecture research.
  • Track record of research contributions at top ML conferences (NeurIPS, CVPR, ICCV, ICML, ICLR).
  • Strong proficiency in modern deep‑learning tooling (PyTorch, CUDA, Triton, FSDP, etc.).
  • Experience training image diffusion models with billions of parameters.
  • Deep understanding of attention, transformers, latent representations, and modern image‑text alignment techniques.
  • Expertise in distributed training systems, model parallelism, and high‑throughput inference.
  • Proven ability to implement and improve complex model architectures end to end.
  • Ability to write clean, modular research code that scales from prototype to production.
  • Strong software‑engineering practices including testing, code review, and CI/CD.
  • Experience with rapid prototyping and experimental design under tight iteration loops.
  • Strong analytical skills for debugging model behavior, numerical stability, and performance bottlenecks.
  • Familiarity with profiling and optimization tools (Nsight, TensorBoard, PyTorch Profiler, etc.).
  • Track record of bringing research ideas to production and maintaining high code quality in a research environment.

Responsibilities

  • Design and implement large‑scale image generation models (transformers, latent diffusion, flow matching, etc.).
  • Develop new approaches to multimodal conditioning and generation (e.g. audio and video) and controllability (editing, multi-frame consistency, script guidance, etc).
  • Research advanced image‑editing and -generation techniques such as content‑preserving edits, multi‑input conditioning, and reference‑based generation.
  • Establish and validate scaling laws for image diffusion models across resolution and parameter count.
  • Develop automated evaluation approaches for improved fidelity and consistency.
  • Drive rapid experimentation with model architectures, sampling strategies, and training strategies.
  • Validate research directly through product deployment and real user feedback.
  • Derive insights from data and recommend architectures and training practices that will make meaningful impacts on our products.
  • Train and optimize models at massive scale (10s–100s of billions of parameters) across multi‑node GPU clusters.
  • Push the boundaries of efficiency and hardware utilization for training and deploying models in a cost effective manner.
  • Develop sophisticated distributed training approaches using FSDP, DeepSpeed, Megatron‑LM, Triton and custom CUDA kernels where needed.
  • Design and implement model‑compression techniques (pruning, distillation, quantization, etc.) for efficient serving.
  • Create new approaches to memory optimization, gradient checkpointing, and mixed‑precision training.
  • Research techniques for improving sampling speed (DDIM, PFGM++, SDE‑VE) and training stability at scale.
  • Conduct systematic empirical studies to benchmark architecture and optimization choices.

Preferred Qualifications

  • Familiarity with large language models or multimodal transformers is a plus.