Skip to content

Senior Platform Software Engineer – Pcie
Company | NVIDIA |
---|
Location | Santa Clara, CA, USA |
---|
Salary | $148000 – $287500 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior |
---|
Requirements
- Deep understanding of Server Architecture, CPU design, PCI Express, and CXL at platform level for enterprise systems.
- Deep understanding of PCI Express and associated Error Handling (RAS) and Performance.
- Deep understanding of Linux kernel.
- Familiar with PCIe Switches and Retimers and associated firmware or configuration files.
- Deep understanding of Memory architecture with a focus on Memory RAS.
- Solid experience of end-to-end delivery of high-end enterprise servers from definition to customer deployment.
- Experience modifying UEFI BIOS and Linux Kernel source.
- Experience writing scripts to assist or automate debug.
- Experience with C/C++ development and debugging skills in Linux operating environments.
- Excellent written and oral communication skills, good work ethics, high sense of team-work, love to produce quality work and commitment to finish tasks every single day.
- Bachelor’s Degree in Electrical Engineering or Computer Science, or equivalent experience.
- At least 7+ years of experience as individual contributor.
Responsibilities
- Drive and deliver innovations for GPU based AI server with focus on PCIe architecture, system engineering, software/firmware changes as per processor & I/O architecture.
- Define system architecture to optimize I/O performance for various GPU applications.
- Debugging complex system issues due to GPU, I/O bus (PCIe, etc.) and CPU.
- Architecting complex systems, I/O error handling from PCIe & other I/O buses and processor viewpoint, fault management for degraded mode operation of the system per datacenter requirements and improve resiliency of GPU based systems.
- Identify gaps in platform debuggability and drive solutions to improve speed and correctness of issue closure.
- Identify new technologies, features to improve performance, functionality, uptime of GPU systems to make it the most performant, secure, and reliable server for AI workloads.
- Work across the industry chooses and enables new and required technologies and brings those to AI servers in the most efficient way.
- Contribute to all phases of product development, from product definition and architecture and design, through implementation, debugging, testing and early customer support.
Preferred Qualifications
- Proven expertise in debugging complicated and time critical issues in both development and production environments.
- Experience with both x86 and Arm architectures.