About the team you will join
As a Machine Learning Research Intern, you’ll join a team of engineers and researchers building algorithms to improve and accelerate our internal drug discovery pipeline. You will be working in the series-expansion team, composed of 3 ML engineers. On a day-to-day basis, you will interact with Victor Saillant.
Your role
You will explore the topic of molecular generation in depth and be responsible for literature review, implementation and training/evaluation of models on public and proprietary data.
Your internship should last between 4 and 6 months, and can start as early as possible in 2024.
Subject of the internship
The objective of the internship is to address the problem of molecule generation conditioned on a protein, and possibly in a constrained chemical space and additional physico-chemical properties. The proposed method involves the use of diffusion models on graphs to address this issue (see references [1][2]). Additionally, alternative approaches, like auto-regressive models, may be explored in a subsequent phase (see references [3][4]).
You are a Masters student or a PhD student in Computer Science, Applied Mathematics, Bioinformatics, or a related field.
You are actively interested in the field of machine learning, and enjoy keeping up to date with current developments.
Your knowledge of mathematics and statistics allows you to understand and critically evaluate research papers from the field.
You are comfortable with Python as a programming language, and ideally have hands-on experience with the implementation (using PyTorch/Jax/Tensorflow), training, and evaluation of deep learning systems.
You are curious and eager to spend time learning new topics from people with diverse backgrounds, and believe that machine learning can play a pivotal role in biology and chemistry for drug discovery.
Experience in representation learning, generative modeling.
You’re interested in complex structured data such as graphs, point clouds, and text.
Knowledge in biology and/or chemistry/chemoinformatics is a strong plus.
[1] Huang, L., Xu, T., Yu, Y. et al. “A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets”. Nat Commun 15, 2657 (2024). https://doi.org/10.1038/s41467-024-46569-1
[2] Schneuing, Arne, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell et al. “Structure-based drug design with equivariant diffusion models.” arXiv preprint arXiv:2210.13695 (2022).
[3] Zhung, W., Kim, H. & Kim, W.Y. 3D molecular generative framework for interaction-guided drug design. Nat Commun 15, 2688 (2024). https://doi.org/10.1038/s41467-024-47011-2
[4] Alexander S. Powers, Helen H. Yu, Patricia Suriana, Rohan V. Koodli, Tianyu Lu, Joseph M. Paggi, and Ron O. Dror. Geometric Deep Learning for Structure-Based Ligand Design ACS Central Science 2023 9 (12), 2257-2267 DOI: 10.1021/acscentsci.3c00572 https://pubs.acs.org/doi/full/10.1021/acscentsci.3c00572
These companies are also recruiting for the position of “Data / Business Intelligence”.