Posted in

Senior Platform Software Engineer – Pcie

Senior Platform Software Engineer – Pcie

CompanyNVIDIA
LocationSanta Clara, CA, USA
Salary$148000 – $287500
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior

Requirements

  • Deep understanding of Server Architecture, CPU design, PCI Express, and CXL at platform level for enterprise systems.
  • Deep understanding of PCI Express and associated Error Handling (RAS) and Performance.
  • Deep understanding of Linux kernel.
  • Familiar with PCIe Switches and Retimers and associated firmware or configuration files.
  • Deep understanding of Memory architecture with a focus on Memory RAS.
  • Solid experience of end-to-end delivery of high-end enterprise servers from definition to customer deployment.
  • Experience modifying UEFI BIOS and Linux Kernel source.
  • Experience writing scripts to assist or automate debug.
  • Experience with C/C++ development and debugging skills in Linux operating environments.
  • Excellent written and oral communication skills, good work ethics, high sense of team-work, love to produce quality work and commitment to finish tasks every single day.
  • Bachelor’s Degree in Electrical Engineering or Computer Science, or equivalent experience.
  • At least 7+ years of experience as individual contributor.

Responsibilities

  • Drive and deliver innovations for GPU based AI server with focus on PCIe architecture, system engineering, software/firmware changes as per processor & I/O architecture.
  • Define system architecture to optimize I/O performance for various GPU applications.
  • Debugging complex system issues due to GPU, I/O bus (PCIe, etc.) and CPU.
  • Architecting complex systems, I/O error handling from PCIe & other I/O buses and processor viewpoint, fault management for degraded mode operation of the system per datacenter requirements and improve resiliency of GPU based systems.
  • Identify gaps in platform debuggability and drive solutions to improve speed and correctness of issue closure.
  • Identify new technologies, features to improve performance, functionality, uptime of GPU systems to make it the most performant, secure, and reliable server for AI workloads.
  • Work across the industry chooses and enables new and required technologies and brings those to AI servers in the most efficient way.
  • Contribute to all phases of product development, from product definition and architecture and design, through implementation, debugging, testing and early customer support.

Preferred Qualifications

  • Proven expertise in debugging complicated and time critical issues in both development and production environments.
  • Experience with both x86 and Arm architectures.