Posted in

Sustainability Data Engineers

Sustainability Data Engineers

CompanyBarclays
LocationParsippany-Troy Hills, NJ, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • Extensive experience in implementing RAG (Retrieval-Augmented Generation) architectures, including advanced techniques in document processing, semantic chunking, and vector embeddings for sustainability knowledge bases
  • Considerable expertise in developing AI/GenAI solutions that combine traditional ETL pipelines with large language models to automate sustainability reporting and environmental impact assessments
  • Proficiency in building data pipelines that integrate diverse environmental data sources, including company filings, regulatory databases, open-source data sets
  • Demonstrated ability to implement advanced RAG functionalities such as hypothetical document embeddings, multi-vector retrieval, and recursive retrieval for complex sustainability queries
  • Deep understanding of data quality management, version control, and metadata management for maintaining reliable and auditable sustainability metrics

Responsibilities

  • Build and maintenance of data architectures pipelines that enable the transfer and processing of durable, complete and consistent data.
  • Design and implementation of data warehoused and data lakes that manage the appropriate data volumes and velocity and adhere to the required security measures.
  • Development of processing and analysis algorithms fit for the intended data complexity and volumes.
  • Collaboration with data scientist to build and deploy machine learning models.

Preferred Qualifications

  • Proficiency in AI/GenAI tooling ecosystem including AWS Bedrock for building production-ready sustainability solutions with containerization using Docker and Docker Compose
  • Experience with AWS services (Lambda, ECS, SageMaker, S3) and API development using Flask/FastAPI to create scalable microservices architecture for environmental data processing and real-time sustainability metrics
  • Considerable data manipulation and analysis skills using Pandas, NumPy, and specialized environmental libraries, with the ability to optimize large-scale sustainability datasets and create efficient ETL pipelines