Sustainability Data Engineers
Company | Barclays |
---|---|
Location | Parsippany-Troy Hills, NJ, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior, Expert or higher |
Requirements
- Extensive experience in implementing RAG (Retrieval-Augmented Generation) architectures, including advanced techniques in document processing, semantic chunking, and vector embeddings for sustainability knowledge bases
- Considerable expertise in developing AI/GenAI solutions that combine traditional ETL pipelines with large language models to automate sustainability reporting and environmental impact assessments
- Proficiency in building data pipelines that integrate diverse environmental data sources, including company filings, regulatory databases, open-source data sets
- Demonstrated ability to implement advanced RAG functionalities such as hypothetical document embeddings, multi-vector retrieval, and recursive retrieval for complex sustainability queries
- Deep understanding of data quality management, version control, and metadata management for maintaining reliable and auditable sustainability metrics
Responsibilities
- Build and maintenance of data architectures pipelines that enable the transfer and processing of durable, complete and consistent data.
- Design and implementation of data warehoused and data lakes that manage the appropriate data volumes and velocity and adhere to the required security measures.
- Development of processing and analysis algorithms fit for the intended data complexity and volumes.
- Collaboration with data scientist to build and deploy machine learning models.
Preferred Qualifications
- Proficiency in AI/GenAI tooling ecosystem including AWS Bedrock for building production-ready sustainability solutions with containerization using Docker and Docker Compose
- Experience with AWS services (Lambda, ECS, SageMaker, S3) and API development using Flask/FastAPI to create scalable microservices architecture for environmental data processing and real-time sustainability metrics
- Considerable data manipulation and analysis skills using Pandas, NumPy, and specialized environmental libraries, with the ability to optimize large-scale sustainability datasets and create efficient ETL pipelines