Aarhus University Seal

New methods for predicting complex traits

Doug Speed, together with colleagues from BiRC and NCRR, have recently developed improved methods for predicting complex traits. Their work has this week been published in the journal Nature Communications.

The assumptions of prediction tools are described by something called the "heritability model". Most existing tools assume the GCTA Model. This figure shows that for four different tools (from left to right, lasso, ridge regression, Bolt-LMM and BayesR), prediction accuracy always increases when one switches from the GCTA Model to more realistic heritability models (e.g., the LDAK-Thin and BLD-LDAK Models). The top plot shows results for 14 individual phenotypes (including traits such as height, body mass index, neuroticism and hypertension), while the bottom plot shows averages across all phenotypes.

There is currently great interest in being able to use an individual's genetic information to predict their phenotypes. This is especially important for personalized medicine, which aims to accurately predict which individuals will develop particular diseases or will benefit from particular medications.

Doug and colleagues have observed that most existing prediction tools assume that each genetic variant is equally important. This assumption is sub-optimal, because recent work has shown that the importance of a variant depends on factors such as its frequency, local levels of linkage disequilibrium and functional annotations. Therefore, this new paper presents eight new prediction tools that allow for alternative assumptions, and shows that this enables substantially improved prediction across a wide range of traits.

Four of the new tools use individual-level data. The paper shows that the best of these, LDAK-Bolt-Predict, outperforms the existing tools Lasso, BLUP, Bolt-LMM and BayesR for all 14 phenotypes considered. The remaining four new tools use summary statistics. The paper shows that the best of these, LDAK-BayesR-SS, outperforms the existing tools lassosum, sBLUP, LDpred and SBayesR for 223 of the 225 phenotypes considered. On average, the new tools outperform the existing tools by 14% (sd 1), which is equivalent to increasing the sample size by about a quarter.

You can read more about the new tools in the paper "Improved genetic prediction of complex traits from individual-level data or summary statistics" (https://www.nature.com/articles/s41467-021-24485-y), and try out the new tools in the software packages LDAK (www.ldak.org) and bigstatsr (https://github.com/privefl/bigstatsr).