Managing Data Pipelines:
Maintain and improve data pipelines on high-performance computing (HPC) clusters.
Ensure the pipelines can handle large amounts of data efficiently and grow as needed.
Improving Decision-Making Systems:
Manage and upgrade systems that support data analysis and decision-making.
Set up processes to organize, clean, and move data for reporting and analytics.
Working with Others:
Collaborate with bioinformatics analysts, data scientists, business analysts, and other teams to understand their data needs.
Help ensure everyone has access to the right data at the right time.
Keeping Data Secure:
Documenting Work:
Create and update clear documentation for data systems and processes.
Share knowledge to help the team understand how systems work.
Hosting: AWS, Kubernetes (EKS), Linux, Docker, Parallelcluster
Monitoring/logging: Datadog
Database: PostgreSQL (RDS)
Data codebase: Python, DBT, SQL, Groovy (Nextflow)
Scheduling : Airflow, Slurm (HPC)
Misc: Jira, Confluence, Google workspace, Slack
Qualification
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
4+ years of experience as Data Engineer.
Expertise in big data technologies like Spark, hadoop ….
Proficient in SQL, data modeling, and data warehousing.
Proficient in programming languages such as Python, Scala etc.
Experience with orchestration tools like Airflow, Kubernetes, etc.
Comfortable to work in English (written & spoken)
Rigorous, curious, with a good sense of service
1st interview: visio with Louis-Baptiste, Head of Engineering (1h)
2nd interview: visio with Kevin, Data Engineer(1h)
3nd interview on site with the COO and 2 peoples from other teams (1h)
Ces entreprises recrutent aussi au poste de “Data / Business Intelligence”.