Posted in

Software Engineer – Frontier Systems

Software Engineer – Frontier Systems

CompanyOpenAI
LocationSan Francisco, CA, USA
Salary$295000 – $440000
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 7+ years of industry experience in software engineering
  • Proficiency with Python and shell scripting
  • A high degree of comfort digging into noisy data with SQL, PromQL, and Pandas or any other tool necessary
  • Experience developing reproducible analyses
  • A balance of strengths in building and operationalizing

Responsibilities

  • Own and improve the system health checks that keep our hyperscale supercomputers stable during model training.
  • Lead deep dives into hardware failures and system-level bugs to understand how things break at scale.
  • Build automation that monitors and fixes issues across thousands of machines – so researchers can keep moving without interruption.

Preferred Qualifications

  • Experience with low level details of hardware components, protocols, and associated Linux tooling (e.g., PCIe, Infiniband, networking, power management, kernel perf tuning)
  • Experience with visualization of large data centers and networks.
  • Expertise with network operations and tooling
  • Expertise with power management and stabilization