Annabelle Harvey, Clara A Moreau, Kuldeep Kumar, Sebastian Urchs, Hanad Sharmarke, Khadije Jizi, Charles-Olivier Martin, Nadine Younis, Petra Tamer, Jean-Louis Martineau, Pierre Orban, Ana Isabel Silva, Jeremy Hall, Marianne BM van den Bree, Michael J Owen, David EJ Linden, Sarah Lippé, Carrie E Bearden, Guillaume Dumas, Sebastien Jacquemont, Pierre Bellec
Publication year: 2024


There is a growing interest in using machine learning (ML) models to perform automatic diagnosis of psychiatric conditions; however, the level of accuracy and robustness needed for a model to be useful in clinical practice remains out of reach. Patients with different psychiatric diagnoses have traditionally been studied independently, yet there is a growing recognition of signatures shared across them as well as rare genetic copy number variants (CNVs). In this work, we assess the potential of multi-task learning (MTL) to improve accuracy by characterising multiple related conditions with a single model, making use of information shared across diagnostic categories and exposing the model to a larger and more diverse dataset. As a proof of concept, we first established the efficacy of MTL in a context where there is clearly information shared across tasks: the same target (age or sex) is predicted at different sites of data collection in a large fMRI dataset compiled from multiple studies. MTL generally led to substantial gains relative to independent prediction at each site. Performing scaling experiments on the UK Biobank, we observed that performance was highly dependent on sample size: for large sample sizes (N>6000) sex prediction was better using MTL across three sites (N=K per site) than prediction at a single site (N=3K), but for small samples (N<500) MTL was actually detrimental for age prediction. We then used established machine learning methods to benchmark the diagnostic accuracy of each of the 7 CNVs (N=19-103) and 4 psychiatric conditions (N=44-472) independently, replicating the accuracy previously reported in the literature on psychiatric conditions. We observed that MTL hurt performance when applied across the full set of diagnoses, and complementary analyses failed to identify pairs of conditions which would benefit from MTL. Taken together, our results show that if a successful multi-task diagnostic model of psychiatric conditions were to be developed with resting-state fMRI, it would likely require datasets with thousands of patients across different diagnoses.

Keywords: Machine learning, multi-task learning, multi-site data, fMRI, CNVs, psychiatric conditions

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.