Skip to content

Member of Technical Staff – Training Data Infrastructure
Company | Captions |
---|
Location | New York, NY, USA |
---|
Salary | $170000 – $250000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s, Master’s |
---|
Experience Level | Mid Level, Senior |
---|
Requirements
- Bachelor’s or Master’s degree in Computer Science, Machine Learning, or related field.
- 3+ years experience in ML infrastructure development or large-scale data engineering.
- Strong programming skills, particularly in Python and distributed computing frameworks.
- Expertise in building and optimizing high-throughput data pipelines.
- Proven experience with video/image data pre-processing and feature engineering.
- Deep knowledge of machine learning workflows, including model training and data loading systems.
- Track record in performance optimization and system scaling.
- Experience with cluster management and distributed computing.
- Background in MLOps and infrastructure monitoring.
- Demonstrated ability to build reliable, large-scale data processing systems.
Responsibilities
- Build performant pipelines for processing video and multimodal training data at scale.
- Design distributed systems that scale seamlessly with our rapidly growing video and multimodal datasets.
- Create efficient data loading systems optimized for GPU training throughput.
- Implement comprehensive telemetry for video processing and training pipelines.
- Create foundation data processing systems that intelligently cache and reuse expensive computations across the training pipeline.
- Build robust data validation and quality measurement systems for video and multimodal content.
- Design systems for data versioning and reproducing complex multimodal training runs.
- Develop efficient storage and compute patterns for high-dimensional data and learned representations.
- Own and improve end-to-end training pipeline performance.
- Build systems for efficient storage and retrieval of video training data.
- Build frameworks for systematic data and model quality improvement.
- Develop infrastructure supporting fast research iteration cycles.
- Build tools and systems for deep understanding of our training data characteristics.
- Build infrastructure enabling rapid testing of research hypotheses.
- Create systems for incorporating user feedback into training workflows.
- Design measurement frameworks that connect model improvements to user outcomes.
- Enable systematic experimentation with direct user feedback loops.
Preferred Qualifications
No preferred qualifications provided.