Skip to content

Manager – Monitoring Operations
Company | OneMain Financial |
---|
Location | Baltimore, MD, USA |
---|
Salary | $0 – $120000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior |
---|
Requirements
- 6 – 8 total years of engineering experience
- Up to three years’ experience of progressive responsibility as an engineer on development teams directly responsible for building and delivering software intensive systems
- BS degree in a computer related field
- Excellent communications
- Hands-on experience with at least four of the following tools/technologies: OpenTelemetry, Elastic, Grafana, OpsRamp, BigPanda, AppDynamics, PowerShell, and Python scripting
- Strong understanding of instrumenting and configuring monitoring dashboards and alerts
- Strong aptitude for problem-solving, system analysis, and troubleshooting, especially in complex environments
- Proven ability to manage multiple competing priorities, structure work effectively, and communicate progress through visual metrics and reports
- Ability to work cross-functionally with teams across different departments
- Demonstrated success in building and maintaining positive customer and team member relationships
- Proven ability to lead and drive change within an organization
Responsibilities
- Lead and manage a team of 10-15 monitoring personnel, including full-time employees and contractors
- Foster team development by mentoring staff on skills and professional growth, creating opportunities for learning and applying new knowledge
- Recognize and celebrate team members who consistently deliver value
- Hire and onboard personnel to fill important roles, ensuring 24×7 coverage across shifts
- Create and maintain metrics to monitor team performance, providing regular trending data to reflect team efficiency and impact
- Oversee the team’s 24×7 monitoring efforts, including reviewing monitoring dashboards, responding to alerts, and addressing potential issues before they impact business operations
- Correlate alerts and create tickets to ensure follow-up actions are taken on issues that could cause significant impact
- Partner with cross-functional teams to reduce alert noise, improve alert content, and refine alert thresholds
- Work closely with application development teams to understand their services and features, identifying areas for monitoring improvement
- Document monitoring and alerting requirements and share them with the Tools & Engineering team for instrumentation
- Assist with the configuration of dashboards, alerts, and other monitoring tools to ensure efficient system performance
- Accurately report monitoring status and issues to senior management, ensuring transparency and timely escalation of critical problems
- Establish and maintain a Level 1 support function, ensuring the effective triage and escalation of incidents and anomalies
- Work with the Helpdesk team to intake tickets that do not have a designated Technology owner and ensure proper ticket assignment
- Escalate tickets that require urgency and coordinate with Level 2 and Level 3 support teams to develop effective engagement procedures during active incidents
- Track and report on the performance of the Level 1 support function, using metrics and trending data to assess success and identify areas for improvement
Preferred Qualifications
- IT certifications such as CompTIA A+, Network+, AWS Certified Cloud Practitioner, Microsoft Certified: Azure Fundamentals, or similar certifications are a plus
- Experience with cloud platforms and services, particularly AWS and Azure
- Familiarity with Data Center Operations and batch processing