WANTED

Lead Data Engineer – REF 114 – 01

  • Full Time
  • Permanent
  • remote
  • Europe and South America, Remote in Europe, Remote in Europe and South America

 

Purpose of the Role:
We are seeking a skilled Lead Data Engineer to join our dynamic team and contribute to the design and implementation of data-driven solutions. You will be responsible for developing and optimizing distributed data processing pipelines, enabling large-scale data analytics, and ensuring the efficient handling of big data. If you are passionate about working with cutting-edge technologies in a fast-paced environment, this role is for you.

Duties and Responsibilities:

  • Design, develop, and maintain data pipelines using AWS Glue, PySpark, and Apache Spark to process and transform large-scale datasets efficiently.
  • Collaborate with data scientists, analysts, and engineers to understand data requirements and translate them into scalable solutions.
  • Optimize data pipelines for performance and scalability in distributed environments.
  • Build and deploy big data solutions in cloud environments (e.g., AWS, Azure, GCP)
  • Implement streaming data pipelines using Spark Structured Streaming, Kinesis, or Kafka where business requirements demand real-time processing.
  • Develop and maintain data models, ensuring data integrity and consistency.
  • Troubleshoot and debug issues in existing pipelines, ensuring high reliability and availability of systems.
  • Document technical solutions, data flows, and pipeline architecture to ensure knowledge sharing.

Required Experience & Knowledge:

  • 3+ years of experience in data engineering with a proven track record of designing and implementing production data pipelines
  • Senior-level proficiency with AWS Glue, including Glue ETL jobs, Crawlers, Data Catalog, job orchestration, and Glue Studio
  • Strong experience with PySpark and Apache Spark for large-scale data processing in distributed environments
  • Deep knowledge of dimensional data modeling techniques including star schema, snowflake schema, and slowly changing dimensions
  • Hands-on experience designing and optimizing data warehouses (Redshift, Snowflake, or similar platforms)
  • Proficiency in Python with strong understanding of data structures, algorithms, and software engineering best practices
  • Production experience with AWS services including S3, Glue, Redshift, Athena, EMR, Lambda, Step Functions, and CloudWatch
  • Experience implementing data quality frameworks, data validation, and monitoring solutions
  • Knowledge of ETL design patterns, error handling, retry logic, and pipeline orchestration
  • Proficiency working with data formats such as Parquet, Avro, ORC, JSON, and CSV
  • Strong understanding of data lake and lakehouse architectures, partitioning strategies, and schema evolution
  • Hands-on experience with infrastructure-as-code (Terraform, CloudFormation) and CI/CD pipelines for data workflows
  • Experience with dbt (data build tool) or similar SQL-based transformation frameworks is a strong plus
  • Familiarity with data governance, metadata management, and data lineage tools (AWS Glue Data Catalog, DataHub, etc.)
  • Experience mentoring engineers and leading technical design discussions

Skills and Attributes:

  • Strong analytical and problem-solving abilities, with attention to detail.
  • Excellent collaboration and communication skills to work in cross-functional teams.
  • Ability to adapt quickly to new technologies and a fast-paced work environment.
  • High level of ownership and accountability for deliverables.

Required Education & Qualifications:

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field (or equivalent practical experience).
  • Advanced level of spoken and written English.
  • Relevant certifications in big data technologies, cloud platforms, or Spark are a plus.

 

Apply Online

A valid email address is required.
Docx and PDF files allowed. Up to 5MB max size.