Bayesian deep learning (BDL) has received renewed interest recently for largescale AI settings [Papamarkou et al., 2024]. In the context of foundation mo for scientific domains, BDL brings a number of advantages compared to conventional approaches for training foundation models based on deep learning:
Uncertainty quantification. Since BDL methods estimate the posterior predictive distribution, they allow for flexible and reliable quantification of the uncertainty of predictions generated by foundation models. Quantifying uncertainty in predictions facilitates risk assessment and improves decision making, and can be especially important in some domains where scientific foundation models may be used, such as in clinical health settings.
Reduction of hallucinations. In large language models (LLMs), prompts tha are out of distribution may lead the model to generate wrong answers with high confidence, known as hallucination. Hallucinations may also occur in foundation models for other domains, such as foundation models for text-toimage, robotics, and autonomous driving [Papamarkou et al., 2024]. Sin BDL enables reliable uncertainty quantification, it may be used to mitigate hallucinations.
Data efficiency. BDL enables flexible regularization approaches that are important in reducing overfitting and improving generalization from few examples [Sharma et al., 2023]. BDL may thus allow for foundation mode that have higher performance when pre-trained on small datasets. High data efficiency in BDL may also lead to better results for foundation model finetuning, since datasets used for fine-tuning are often relatively small and sparse.
Despite the advantages described above, BDL remains largely unexplored in the context of foundation models. In this project, we propose applying state-of-the-art scalable BDL methods to relatively small foundation models, as a starting point for research on this important topic.
During this internship, we expect the intern to lead and participate in the development of the following deliverables: Conduct a short literature review on the topic of this internship project.
Identify an existing small open source scientific foundation model (FM) th can be used as a baseline and starting point for this work, and acquire datasets and code for pre-training, fine-tuning, and evaluation. Options for existing scientific FMs that could be used as baselines include:
a) A small pre-trained Large Brain Model (LaBraM), composed of 5.8 parameters, is publicly available, and can be easily used together with the public EEG datasets such as TUAB for running small-scale fine-tuning experiments.
b) Pre-training a new very small-scale LaBraM model with public EEG datasets from the TUH EEG Corpus would be feasible.
c) The publicly-available FM from the “Decoding speech from non-invasive recordings of brain activity” paper would also be a useful baseline for pretraining and fine-tuning experiments.
Implement code for one or more scalable BDL training algorithms, such as a variational approximation method and a stochastic gradient Markov chain Monte Carlo method (SG-MCMC). If feasible, existing BDL libraries may used. Pre-train baseline scientific FM and BDL version(s) of this scientific FM.
Implement code for evaluating the unique features provided by the BDL version of the selected FM, including uncertainty quantification and data efficiency. All code implemented for this project will be pushed to a Git repo
A paper submission to a conference or journal presenting the results of this project.
We are looking for candidates with the following qualifications:
A strong background in machine learning (ML), probability and statistics, a deep learning.
Proficiency in Python and PyTorch (or similar Python ML libraries).
Highly motivated, independent, and able to efficiently collaborate with other team members on a research project.
Useful but not mandatory qualifications:
Experience with Bayesian methods for ML.
Experience with ML approaches related to foundation models for vision, language, or other domain-specific settings
Introduction call with Head of TA (Paul) - 30min
Interview with Project lead (Mike) - 45min
Technical interview (onsite) - 1h30
These companies are also recruiting for the position of “Data / Business Intelligence”.
See all job openings