Big Data Engineer

Bachelor’s, Master’s, or Ph.D. in Computer Science, Information Technology, or related field.
10+ years of industry experience in big data engineering with proven expertise in Hadoop and Spark technologies.
Extensive experience with Hadoop ecosystem components: HDFS, MapReduce, YARN, Hive, Pig, HBase, and Oozie.
Strong proficiency in Apache Spark (Scala, Python, or Java) and Spark SQL.
Experience with data ingestion tools such as Apache NiFi, Kafka, or Flume.
Hands-on experience with cloud platforms (AWS, Azure, GCP) and their big data services integrated with Hadoop/Spark.
Knowledge of data modeling, data warehousing, and database technologies (NoSQL, relational systems).
Familiarity with containerization and orchestration tools like Docker and Kubernetes.
Familiar with data governance, security, and compliance standards.
Excellent problem-solving, system architecture, and communication skills.

Design, develop, and optimize big data pipelines and processing frameworks using Hadoop (HDFS, MapReduce, YARN) and Apache Spark.
Build scalable data ingestion processes and data lakes for diverse data sources.
Develop and maintain ETL workflows that handle processing of structured and unstructured data.
Collaborate with Data Scientists, Analysts, and Business Teams to translate requirements into technical solutions.
Tune and troubleshoot Spark and Hadoop jobs for maximum efficiency and performance.
Implement data security, privacy, and compliance best practices across all platforms.
Mentor junior team members and foster best practices in big data development.
Stay current with emerging trends and technologies related to Hadoop and Spark.
Document architecture, workflows, and standards for maintainability and knowledge sharing.

Experience with Spark Streaming and real-time data processing.
Knowledge of advanced analytics, machine learning pipelines, and integration with Spark MLlib.
Experience with automation and orchestration tools such as Apache Airflow.
Familiarity with version control and CI/CD practices for big data platforms.