Sr. Data Engineer

Master’s degree or higher in Computer Science, Data Engineering, Information Systems, or a related technical field
5+ years of experience as a Data Engineer, DevOps engineer or similar role
Expert knowledge with AWS cloud computing including hands on experience with EC2, S3, Athena, Elastic Kubernetes service, Elastic Container Registry
Strong proficiency in Python, including developing stand-alone libraries and deploying automated ETL pipelines
Knowledge of at least one testing suite
Hands-on experience with at least one modern data platform such as Databricks or Snowflake
Expertise in Apache Spark, Spark SQL, DataFrames, and PySpark
Knowledge of relational databases
Version control with git, including setting up and managing remote repositories, implementing proper branch management, resolving merge conflicts locally, collaborating on remote repository, working with protected branches, submitting and resolving pull requests, adding automated tests with github actions.

Architect & Develop Scalable Data Infrastructure: Design and implement robust, secure, and scalable data pipelines and infrastructure on AWS using EC2, S3, Athena, EKS, and other cloud-native services
Optimize Data Processing: Leverage Apache Spark and Databricks to process large-scale datasets efficiently for analytics, reporting, and machine learning applications
Automate Data Workflows: Build and maintain orchestration workflows using Apache Airflow to automate data pipelines
Monitor & Optimize Performance: Continuously improve system reliability, performance, and cost efficiency through monitoring, logging, and infrastructure optimization
Collaborate with Cross-Functional Teams: Work closely with computational biologists, experimental scientists, and colleagues in business development to provide accessible and high-quality data solutions