Skip to content

Senior Resiliency and Safety Architect
Company | NVIDIA |
---|
Location | Santa Clara, CA, USA |
---|
Salary | $184000 – $356500 |
---|
Type | Full-Time |
---|
Degrees | Master’s, PhD |
---|
Experience Level | Senior |
---|
Requirements
- Master’s or PhD degree in Computer Science, Computer Engineering, Electrical Engineering or closely related degree or equivalent experience.
- At least 5+ years of relevant experience.
- Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, buses, direct memory access, etc.).
- Proficiency in C/C++.
- Scripting and automation with Python or similar.
- Understanding of the software development process, from requirements to testing closure and maintenance.
- Experience with resiliency and/or functional safety.
- Excellent interpersonal skills and ability to collaborate with on-site and remote teams.
- Strong debugging and analytical skills.
- Be self-driven and results oriented.
Responsibilities
- Collaborate with the Software and Hardware teams to architect new safety and resiliency features and guide future development.
- Optimize hardware & software features to improve system robustness, performance, and security.
- Model and analyze RAS metrics like Failures in Time and Availability; and Safety metrics like Diagnostic Coverage and PMHF.
- Run simulations to analyze Architectural Vulnerability Factor and Liveness of on-die memory.
- Develop diagnostics software components for Resiliency and Safety to run on NVIDIA GPUs.
- Participate in testing new and existing resiliency and safety hardware and software features.
- Work on compliance of products with functional safety standards (ISO 26262 and ASPICE (Automotive SPICE)). This includes defining requirements, architecture, and design with end-to-end traceability, performing safety analyses – FMEA/DFA/FTA and ensuring compliance of software to MISRA and Cert-C standards.
Preferred Qualifications
- Familiarity with general HW concepts, Verilog RTL coding and simulations/debug, GPU and SOC Architectures, and Machine Learning/Deep Learning concepts
- Programming with CUDA
- Experience in embedded software development.