Core AI Team Internship - Bayesian Deep Learning for Foundation Models

Internship(4 to 6 months)
Paris
Salary: Not specified
A few days at home
Education: Master's Degree

Sigma Nova
Sigma Nova

Interested in this job?

Questions and answers about the job

The position

Job description

Background

Bayesian deep learning (BDL) has received renewed interest recently for largescale AI settings [Papamarkou et al., 2024]. In the context of foundation mo for scientific domains, BDL brings a number of advantages compared to conventional approaches for training foundation models based on deep learning:

Uncertainty quantification. Since BDL methods estimate the posterior predictive distribution, they allow for flexible and reliable quantification of the uncertainty of predictions generated by foundation models. Quantifying uncertainty in predictions facilitates risk assessment and improves decision making, and can be especially important in some domains where scientific foundation models may be used, such as in clinical health settings.

Reduction of hallucinations. In large language models (LLMs), prompts tha are out of distribution may lead the model to generate wrong answers with high confidence, known as hallucination. Hallucinations may also occur in foundation models for other domains, such as foundation models for text-toimage, robotics, and autonomous driving [Papamarkou et al., 2024]. Sin BDL enables reliable uncertainty quantification, it may be used to mitigate hallucinations.

Data efficiency. BDL enables flexible regularization approaches that are important in reducing overfitting and improving generalization from few examples [Sharma et al., 2023]. BDL may thus allow for foundation mode that have higher performance when pre-trained on small datasets. High data efficiency in BDL may also lead to better results for foundation model finetuning, since datasets used for fine-tuning are often relatively small and sparse.

Despite the advantages described above, BDL remains largely unexplored in the context of foundation models. In this project, we propose applying state-of-the-art scalable BDL methods to relatively small foundation models, as a starting point for research on this important topic.

Objective

During this internship, we expect the intern to lead and participate in the development of the following deliverables: Conduct a short literature review on the topic of this internship project.

  • Identify an existing small open source scientific foundation model (FM) th can be used as a baseline and starting point for this work, and acquire datasets and code for pre-training, fine-tuning, and evaluation. Options for existing scientific FMs that could be used as baselines include:

    a) A small pre-trained Large Brain Model (LaBraM), composed of 5.8 parameters, is publicly available, and can be easily used together with the public EEG datasets such as TUAB for running small-scale fine-tuning experiments.

    b) Pre-training a new very small-scale LaBraM model with public EEG datasets from the TUH EEG Corpus would be feasible.

    c) The publicly-available FM from the “Decoding speech from non-invasive recordings of brain activity” paper would also be a useful baseline for pretraining and fine-tuning experiments.

  • Implement code for one or more scalable BDL training algorithms, such as a variational approximation method and a stochastic gradient Markov chain Monte Carlo method (SG-MCMC). If feasible, existing BDL libraries may used. Pre-train baseline scientific FM and BDL version(s) of this scientific FM.

  • Implement code for evaluating the unique features provided by the BDL version of the selected FM, including uncertainty quantification and data efficiency. All code implemented for this project will be pushed to a Git repo

  • A paper submission to a conference or journal presenting the results of this project.


Preferred experience

We are looking for candidates with the following qualifications:

  • A strong background in machine learning (ML), probability and statistics, a deep learning.

  • Proficiency in Python and PyTorch (or similar Python ML libraries).

  • Highly motivated, independent, and able to efficiently collaborate with other team members on a research project.

Useful but not mandatory qualifications:

  • Experience with Bayesian methods for ML.

  • Experience with ML approaches related to foundation models for vision, language, or other domain-specific settings


Recruitment process

  • Introduction call with Head of TA (Paul) - 30min

  • Interview with Project lead (Mike) - 45min

  • Technical interview (onsite) - 1h30

Want to know more?

These job openings might interest you!

These companies are also recruiting for the position of “Data / Business Intelligence”.

See all job openings