Posted in

Senior Infrastructure Engineer

Senior Infrastructure Engineer

CompanyUber Freight
LocationChicago, IL, USA
Salary$131300 – $160150
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • Strong communication and leadership abilities, including the capacity to mentor less-experienced engineers, facilitate cross-team collaboration, and serve as a technical thought leader.
  • Strong experience and expertise in automating provisioning, configuration management, and KTLO tasks using Ansible and Terraform.
  • Strong experience and expertise in deploying Infrastructure as Code (IaC) for VM, container, network, and other components in both on-prem and cloud environments.
  • Demonstrated experience in at least one scripting language (e.g., Bash, Python) and familiarity with YAML to develop and maintain automation scripts and playbooks.
  • Strong understanding of DevOps tools and practices, including CI/CD pipelines (e.g., Jenkins, Git).
  • Experience with DevOps tools other than infrastructure automation, e.g., Docker, Kubernetes, Vagrant, etc.
  • Solid environment and configuration management skills; ability to maintain multiple baselines, e.g., Dev, Test, Prod.
  • Strong ability to troubleshoot system integration issues.
  • 5 years hands-on experience with infrastructure automation (Ansible, Terraform).
  • Total work experience of not less than 10 years.

Responsibilities

  • Design and implement automation solutions using PowerShell, Ansible, Terraform, Jenkins, Python, or other tools to streamline and optimize operational processes such as provisioning, configuration, patching, backup, recovery, monitoring, and auditing.
  • Move the organization towards Infrastructure as Code (IaC) through elimination of existing manual steps / legacy system designs.
  • Provide technical leadership, guidance, and mentoring to a team of existing infrastructure engineers that lack significant automation experience.
  • Advance the adoption of DevOps principles by implementing CI/CD pipelines and automating infrastructure provisioning using tools like Jenkins, Ansible, or Terraform.
  • Plan, coordinate, and execute Windows and Linux infrastructure and automation projects, including scoping, budgeting, scheduling, resourcing, risk management, and stakeholder communication.
  • Streamline KTLO processes to reduce manual work.
  • Research and evaluate new technologies and solutions to improve the Windows and Linux infrastructure and automation capabilities of the company.
  • Diagnose and resolve issues with automation systems promptly and efficiently.
  • Integrate automation tools with existing systems, including APIs, databases, and enterprise software, ensuring seamless interoperability.
  • Ensure the accuracy, performance, and reliability of automation tools through rigorous testing and debugging processes.
  • Design and support automated jobs deployed from ticketing system (Service Now and Jira) for new VM builds and changes to existing builds.
  • Streamline and mature the Linux and Windows VM server deployment process currently kicked off from AWX.
  • Establish and enforce Windows and Linux infrastructure automation standards, guidelines, and procedures, and conduct regular reviews and audits to ensure compliance.
  • Deploy, manage, and maintain virtualization platforms (VMware and Proxmox).
  • Escalate and drive issues related to support of virtualization platforms.
  • Support and maintain hybrid virtualization and containerized environments, integrating with Kubernetes.
  • Administer and fine-tune Linux systems to support virtualized infrastructure and containerized workloads.
  • Work with application teams to assist in deploying virtual workloads as needed.
  • Engage in service capacity planning, demand forecasting, and system tuning.
  • Stay abreast of industry trends and emerging technologies to incorporate best practices and innovative solutions into the automation strategy.
  • Develop automated incident response mechanisms, track critical system metrics, and participate in an on-call rotation for high-priority issues.
  • Install, configure, and test high availability and disaster recovery solutions.
  • Join high severity incident calls and lead troubleshooting efforts for quick recovery.
  • Work collaboratively with other IT team members to resolve complex issues and implement improvements.
  • Troubleshoot and resolve complex issues related to Windows and Linux infrastructure and automation, and provide root cause analysis and remediation plans.
  • Create and maintain comprehensive documentation for tools, scripts, workflows, and processes, ensuring clarity, usability, and accessibility for all stakeholders.
  • Support vulnerability management by scanning, automating, and remediating vulnerabilities.
  • Improve integration and collaboration with other teams on validating their systems before and after regular monthly patching cycles (unit testing opportunity).
  • Participate in additional projects based on changing business needs and priorities.

Preferred Qualifications

  • 2 years Linux systems administration.
  • 2 years Windows systems administration.
  • 3 years hands-on experience managing Kubernetes clusters and other virtualization systems (VMware, Proxmox, Xen, or Hyper-V).
  • Knowledge of basic Windows technologies (AD, DNS, etc.).
  • Experience incorporating PCI, SOX, SOC 2 controls related to OS hardening.
  • Experience working with multiple cloud providers (Azure, GCP, OCI).
  • Experience working on Data Center Migration Projects.
  • Experience working on Application/Platform Migration Projects.
  • Strong understanding of Linux systems and API integration.
  • Knowledge of basic Network technology as it relates to virtualization.
  • Knowledge of basic Storage technology as it relates to virtualization.
  • Experience in ITIL-based service management and agile methodologies.
  • Experience with development team management software including JIRA, GitHub, Confluence.
  • Strong problem-solving, troubleshooting, and analytical skills, and attention to detail and quality.
  • Relevant OS, Cloud, or Automation certifications (Ex: Microsoft MCSA/MCSE AZ-104, Comptia Linux+, Hashicorp Terraform Associate, GCP Cloud Engineer/Architect etc.) preferred but not required.