Technical Lead – Services Reliability & Management
Company | ServiceNow |
---|---|
Location | Addison, TX, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Expert or higher |
Requirements
- Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI’s potential impact on the function or industry.
- 10+ years of professional software delivery experience, with a focus on Microservices architecture and support.
- Strong proficiency in Java and/or Python.
- Extensive experience with containerization technologies such as Docker and orchestration tools like Kubernetes.
- Deep understanding of RESTful APIs and API gateway technologies.
- Strong knowledge of SQL, NoSQL, and in-memory databases.
- Familiarity with CI/CD tools such as Jenkins, GitLab CI.
- Knowledge of event-driven architectures and messaging systems like Kafka, RabbitMQ.
- Excellent problem-solving skills and the ability to think critically and analytically.
- Experience with cloud platforms such as Azure, or Google Cloud.
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK stack, or Splunk.
- Experience in defining and rolling out key support/operational processes is essential.
- Strong communication skills, with the ability to articulate complex technical concepts to both technical and non-technical stakeholders.
- Team leadership experience and mindset to celebrate successes and acknowledge the hard work and dedication of the team.
Responsibilities
- Oversee and ensure high-performance support for deployed microservices, including AI/ML, foundational, and integration services.
- Collaborate closely with different internal departments to understand their business requirements and manage expectations clearly.
- Engage with peers across various departments to comprehend their critical needs and provide reliable support services that enhance business efficiency.
- Tasked with proactively monitoring and troubleshooting service performance, ensuring high availability and reliability to minimize any potential business impact.
- Engage with partner teams to gather comprehensive details on service alerts.
- Analyze each incident thoroughly to identify the root cause, develop and provide technical solutions as necessary to resolve the incidents.
- Responsible for preparing and communicating operational metrics both within the organization and to external stakeholders.
- Lead and mentor a team of engineers, fostering a culture of continuous improvement.
- Define and rollout best practices for technical support and operational activities.
- Stay updated with the latest industry trends and technologies, advocating for their adoption where appropriate.
Preferred Qualifications
-
No preferred qualifications provided.