Staff Site Reliability Engineer

Bachelor’s degree in Computer Science, Engineering or related field followed by 5 years of progressive, post-baccalaureate experience in the job offered or in a related systems engineer or systems/software administrator occupation
Master’s degree in Computer Science, Engineering or related field, and 2 years of experience in the job offered or in a related systems engineer or systems/software administrator occupation
Experience in Hadoop Platform environments
Experience in Hadoop administration
Experience in Zookeeper
Experience in HDFS
Experience in Yarn
Experience in Hive
Experience in Spark
Experience in Shell and Python scripting
Experience in Unix and Linux

Monitor, troubleshoot, automate, and continuously develop software tools to improve the availability and resiliency of open source big data platforms at Visa
Perform big data administration and engineering activities on multiple open source clusters
Build and maintain relationships with customer teams, the user community, architects, and engineering teams and jointly work on key deliverables to ensure production scalability and stability
Perform root cause analysis of major production incidents and develop learning documentation
Plan and perform capacity expansion and upgrades in timely manner avoiding any scaling issues and bugs
Automate repetitive tasks to reduce manual effort and avoid human error
Work closely with L-3 teams in reviewing new cases, cluster hardening techniques, and performance problems
Leverage devops tools, disciplines (e.g., incident, problem, and change management) and standards in day-to-day operations

No preferred qualifications provided.