Posted in

Senior Resiliency and Safety Architect

Senior Resiliency and Safety Architect

CompanyNVIDIA
LocationSanta Clara, CA, USA
Salary$184000 – $356500
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelSenior

Requirements

  • Master’s or PhD degree in Computer Science, Computer Engineering, Electrical Engineering or closely related degree or equivalent experience.
  • At least 5+ years of relevant experience.
  • Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, buses, direct memory access, etc.).
  • Proficiency in C/C++.
  • Scripting and automation with Python or similar.
  • Understanding of the software development process, from requirements to testing closure and maintenance.
  • Experience with resiliency and/or functional safety.
  • Excellent interpersonal skills and ability to collaborate with on-site and remote teams.
  • Strong debugging and analytical skills.
  • Be self-driven and results oriented.

Responsibilities

  • Collaborate with the Software and Hardware teams to architect new safety and resiliency features and guide future development.
  • Optimize hardware & software features to improve system robustness, performance, and security.
  • Model and analyze RAS metrics like Failures in Time and Availability; and Safety metrics like Diagnostic Coverage and PMHF.
  • Run simulations to analyze Architectural Vulnerability Factor and Liveness of on-die memory.
  • Develop diagnostics software components for Resiliency and Safety to run on NVIDIA GPUs.
  • Participate in testing new and existing resiliency and safety hardware and software features.
  • Work on compliance of products with functional safety standards (ISO 26262 and ASPICE (Automotive SPICE)). This includes defining requirements, architecture, and design with end-to-end traceability, performing safety analyses – FMEA/DFA/FTA and ensuring compliance of software to MISRA and Cert-C standards.

Preferred Qualifications

  • Familiarity with general HW concepts, Verilog RTL coding and simulations/debug, GPU and SOC Architectures, and Machine Learning/Deep Learning concepts
  • Programming with CUDA
  • Experience in embedded software development.