Posted in

Lead Data Scientist

Lead Data Scientist

CompanyMastercard
LocationO’Fallon, MO, USA
Salary$138000 – $221000
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelSenior, Expert or higher

Requirements

  • Graduate degree in CS, Data Science, Statistics, Machine Learning, AI or a related STEM field.
  • Demonstrated ability to independently contribute to overall team objectives.
  • Strong background in statistics, probability, and linear algebra as applied to machine learning models.
  • Data science and data engineering experience.
  • Proven experience with supervised and unsupervised learning techniques, such as multiple of XGBoost/LightGBM/GBM, deep neural networks, Isolation Forest, and clustering.
  • Strong grasp over data science and machine learning concepts.
  • Experience with SQL language and one or multiple of PySpark, Hadoop, Impala, Hive.
  • Good knowledge of Linux / Bash environment.
  • Python, Pyspark.
  • Knowledge of model optimization techniques.
  • The ability to work closely with more senior data scientists implementing and optimizing models within big data pipelines.
  • Good communication skills.
  • Highly skilled problem solver.
  • Exhibits a high degree of initiative.

Responsibilities

  • Work closely with the business owners to understand business requirements, performance metrics regarding data quality and model performance of customer facing products.
  • Lead the development of advancing fraud detection models specifically credit and debit card transaction level models.
  • Lead the efforts to enhance the best modeling practices that maintains the competitiveness of our fraud detection models.
  • Oversee implementation of data and model development pipelines.
  • Explore fraudulent patterns or trends for feature discovery and enhance fraud detection model performance.
  • Manage the testing of trained models to ensure their robustness and assess their readiness for deployment.

Preferred Qualifications

  • Experience building payment fraud detection models.
  • PhD in CS, Statistics, or a related quantitative STEM field.
  • Experience with data engineering in PySpark on petabyte scale data.
  • Expertise in timeseries analysis and forecasting techniques, such as ARIMA, RNN, and LSTM networks to detect anomalies in large-scale, time-sensitive datasets.
  • Experience with active learning methods, particularly in situations where labeled data is scarce or expensive to obtain.
  • Understanding of data privacy, algorithm bias, and developing fair, transparent, and accountable algorithms.
  • Understands and implements methods to evaluate own work and others for error.
  • Loves working with error-prone, messy, disparate, unstructured data.