Skip to content

Lead Data Scientist
Company | Mastercard |
---|
Location | O’Fallon, MO, USA |
---|
Salary | $138000 – $221000 |
---|
Type | Full-Time |
---|
Degrees | Master’s, PhD |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Graduate degree in CS, Data Science, Statistics, Machine Learning, AI or a related STEM field.
- Demonstrated ability to independently contribute to overall team objectives.
- Strong background in statistics, probability, and linear algebra as applied to machine learning models.
- Data science and data engineering experience.
- Proven experience with supervised and unsupervised learning techniques, such as multiple of XGBoost/LightGBM/GBM, deep neural networks, Isolation Forest, and clustering.
- Strong grasp over data science and machine learning concepts.
- Experience with SQL language and one or multiple of PySpark, Hadoop, Impala, Hive.
- Good knowledge of Linux / Bash environment.
- Python, Pyspark.
- Knowledge of model optimization techniques.
- The ability to work closely with more senior data scientists implementing and optimizing models within big data pipelines.
- Good communication skills.
- Highly skilled problem solver.
- Exhibits a high degree of initiative.
Responsibilities
- Work closely with the business owners to understand business requirements, performance metrics regarding data quality and model performance of customer facing products.
- Lead the development of advancing fraud detection models specifically credit and debit card transaction level models.
- Lead the efforts to enhance the best modeling practices that maintains the competitiveness of our fraud detection models.
- Oversee implementation of data and model development pipelines.
- Explore fraudulent patterns or trends for feature discovery and enhance fraud detection model performance.
- Manage the testing of trained models to ensure their robustness and assess their readiness for deployment.
Preferred Qualifications
- Experience building payment fraud detection models.
- PhD in CS, Statistics, or a related quantitative STEM field.
- Experience with data engineering in PySpark on petabyte scale data.
- Expertise in timeseries analysis and forecasting techniques, such as ARIMA, RNN, and LSTM networks to detect anomalies in large-scale, time-sensitive datasets.
- Experience with active learning methods, particularly in situations where labeled data is scarce or expensive to obtain.
- Understanding of data privacy, algorithm bias, and developing fair, transparent, and accountable algorithms.
- Understands and implements methods to evaluate own work and others for error.
- Loves working with error-prone, messy, disparate, unstructured data.