Posted in

Manager – Monitoring Operations

Manager – Monitoring Operations

CompanyOneMain Financial
LocationBaltimore, MD, USA
Salary$0 – $120000
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior

Requirements

  • 6 – 8 total years of engineering experience
  • Up to three years’ experience of progressive responsibility as an engineer on development teams directly responsible for building and delivering software intensive systems
  • BS degree in a computer related field
  • Excellent communications
  • Hands-on experience with at least four of the following tools/technologies: OpenTelemetry, Elastic, Grafana, OpsRamp, BigPanda, AppDynamics, PowerShell, and Python scripting
  • Strong understanding of instrumenting and configuring monitoring dashboards and alerts
  • Strong aptitude for problem-solving, system analysis, and troubleshooting, especially in complex environments
  • Proven ability to manage multiple competing priorities, structure work effectively, and communicate progress through visual metrics and reports
  • Ability to work cross-functionally with teams across different departments
  • Demonstrated success in building and maintaining positive customer and team member relationships
  • Proven ability to lead and drive change within an organization

Responsibilities

  • Lead and manage a team of 10-15 monitoring personnel, including full-time employees and contractors
  • Foster team development by mentoring staff on skills and professional growth, creating opportunities for learning and applying new knowledge
  • Recognize and celebrate team members who consistently deliver value
  • Hire and onboard personnel to fill important roles, ensuring 24×7 coverage across shifts
  • Create and maintain metrics to monitor team performance, providing regular trending data to reflect team efficiency and impact
  • Oversee the team’s 24×7 monitoring efforts, including reviewing monitoring dashboards, responding to alerts, and addressing potential issues before they impact business operations
  • Correlate alerts and create tickets to ensure follow-up actions are taken on issues that could cause significant impact
  • Partner with cross-functional teams to reduce alert noise, improve alert content, and refine alert thresholds
  • Work closely with application development teams to understand their services and features, identifying areas for monitoring improvement
  • Document monitoring and alerting requirements and share them with the Tools & Engineering team for instrumentation
  • Assist with the configuration of dashboards, alerts, and other monitoring tools to ensure efficient system performance
  • Accurately report monitoring status and issues to senior management, ensuring transparency and timely escalation of critical problems
  • Establish and maintain a Level 1 support function, ensuring the effective triage and escalation of incidents and anomalies
  • Work with the Helpdesk team to intake tickets that do not have a designated Technology owner and ensure proper ticket assignment
  • Escalate tickets that require urgency and coordinate with Level 2 and Level 3 support teams to develop effective engagement procedures during active incidents
  • Track and report on the performance of the Level 1 support function, using metrics and trending data to assess success and identify areas for improvement

Preferred Qualifications

  • IT certifications such as CompTIA A+, Network+, AWS Certified Cloud Practitioner, Microsoft Certified: Azure Fundamentals, or similar certifications are a plus
  • Experience with cloud platforms and services, particularly AWS and Azure
  • Familiarity with Data Center Operations and batch processing