Lead Data Scientist

Graduate degree in CS, Data Science, Statistics, Machine Learning, AI or a related STEM field.
Demonstrated ability to independently contribute to overall team objectives.
Strong background in statistics, probability, and linear algebra as applied to machine learning models.
Data science and data engineering experience.
Proven experience with supervised and unsupervised learning techniques, such as multiple of XGBoost/LightGBM/GBM, deep neural networks, Isolation Forest, and clustering.
Strong grasp over data science and machine learning concepts.
Experience with SQL language and one or multiple of PySpark, Hadoop, Impala, Hive.
Good knowledge of Linux / Bash environment.
Python, Pyspark.
Knowledge of model optimization techniques.
The ability to work closely with more senior data scientists implementing and optimizing models within big data pipelines.
Good communication skills.
Highly skilled problem solver.
Exhibits a high degree of initiative.

Work closely with the business owners to understand business requirements, performance metrics regarding data quality and model performance of customer facing products.
Lead the development of advancing fraud detection models specifically credit and debit card transaction level models.
Lead the efforts to enhance the best modeling practices that maintains the competitiveness of our fraud detection models.
Oversee implementation of data and model development pipelines.
Explore fraudulent patterns or trends for feature discovery and enhance fraud detection model performance.
Manage the testing of trained models to ensure their robustness and assess their readiness for deployment.

Experience building payment fraud detection models.
PhD in CS, Statistics, or a related quantitative STEM field.
Experience with data engineering in PySpark on petabyte scale data.
Expertise in timeseries analysis and forecasting techniques, such as ARIMA, RNN, and LSTM networks to detect anomalies in large-scale, time-sensitive datasets.
Experience with active learning methods, particularly in situations where labeled data is scarce or expensive to obtain.
Understanding of data privacy, algorithm bias, and developing fair, transparent, and accountable algorithms.
Understands and implements methods to evaluate own work and others for error.
Loves working with error-prone, messy, disparate, unstructured data.