Also, many thanks to the generous colleagues who provided feedback on the paper!
(27/27)
Also, many thanks to the generous colleagues who provided feedback on the paper!
(27/27)
I would like to thank my advisor @arbelharpak.bsky.social, as well as co-authors Neeka, Michael, Jason, @oliviarxiv.bsky.social and Paul!
We note with sadness that Paul passed away when our paper was accepted. His contribution was one of many random acts of kindness that characterized him.
(26/27)
We discuss future directions stemming from our 3 observations, such as moving beyond the analysis of ancestry groupings alone; considering trait evolution, genetic architecture; and evaluating predictive performance in metrics that correspond to the intended application.
(25/27)
All told, our results suggest there are understudied factors contributing to the portability problem.
(24/27)
We found some interesting examples. For asthma, precision and recall depend on genetic distance in a qualitatively similar way. For T2D, precision appears roughly constant for medium and large genetic distances, while recall generally increases with genetic distance.
(23/27)
For example, when a broadly accessible, safe intervention exists, we might prioritize the identification of all cases over all else and therefore focus on recall. When the intervention includes a risky treatment, we might prioritize avoiding false positives and hence focus on precision.
(22/27)
But more generally, the fact conclusions substantially depend on the measure of prediction accuracy suggests to us that more attention should perhaps be given to the measures relevant to the intended application.
(21/27)
So individual-level measures can yield qualitatively different portability trends than the group-level measures that are widely used.
(20/27)
(Observation 3) Qualitative trends of portability can depend on the measure of prediction accuracy used. For some traits our measure of group-level prediction accuracy often drops where at the individual level, it increases.
(19/27)
Together, the poor portability of effect sizes and the increase in heterozygosity of large effect index SNPs make for a PGS that is highly variable + a poor predictor away from the GWAS sample, likely leading to the near-zero partial correlation between PGS and trait.
(18/27)
As a result of the trends of heterozygosity, the variance in the polygenic score quickly increases with genetic distance for white blood cell count, lymphocyte count, and monocyte count, despite decreasing for the remaining 12 continuous traits we have examined.
(17/27)
Relatedly, some cool recent work from @roshnipatel.bsky.social, Jeffrey Spence, @jkpritch.bsky.social et al. dives deep into expectations, based on models of natural selection, for allele frequency in a group B conditional on allele frequency in group A.
academic.oup.com/genetics/art...
(16/27)
Indeed, for lymphocyte count, the heterozygosity of large-effect variants increases with genetic distance from the GWAS sample.
(15/27)
If causal effects on lymphocyte count change rapidly, then large-effect index SNPs may be under weaker selective constraint in the prediction sample than in the GWAS sample and segregate at high allele frequencies.
(14/27)
Allelic effect estimates for lymphocyte count port poorly, even close to the GWAS sampleβon average, they shrink by more than 50%, and with large variance across index SNPs. A good comparison is triglyceride levels, a trait of similar SNP heritability in the GWAS sample.
(13/27)
To test this prediction, we estimated the allelic effects of index SNPs (SNPs included in the PGS, ascertained in the original GWAS sample) in the GWAS set and subsets of the prediction sample, "close", and "far" (in terms of genetic distance from the GWAS sample).
(12/27)
One factor that we considered is the rapid turnover of selective pressures on the immune system across time and geography. We hypothesized that these would lead to less portable genetic associations (across ancestry) in immune-associated loci compared to other traits.
(11/27)
Why would portability trends be trait-specific? There are many possible drivers. In some cases, the specifics of trait evolution and genetic architecture, environmental and social context, and PGS construction seem to matter.
(10/27)
(Observation 2) Trends of portability vary across traits. For example, for height, individual-level prediction error decays nearly monotonically with genetic distance. For some immunity-related traits like white blood cell count, prediction error increases at large genetic distances.
(9/27)
Ding et al. focused on the relationship between genetic distance and the prediction interval, i.e., expected uncertainty in prediction under an assumed theoretical model. Our focus here is instead the relationship with *realized* prediction accuracy.
(8/27)
We would like to point out excellent work by Yi Ding, @bpasaniuc.bsky.social et al. which relatedly analyzes attributes of polygenic scores as a function of genetic distance from the GWAS sample.
rdcu.be/e0QzA
(7/27)
In fact, variance in prediction accuracy is explained comparably well by measures of socioeconomic status (where the average trend is worse prediction for individuals of lower SES).
(6/27)
(Observation 1) Previous work suggested genetic distance from the GWAS sample largely explains variation in PGS performance. However, what we see empirically is that prediction accuracy is extremely noisy at the individual level.
(5/27)
We discuss 3 key observations: (1) Genetic distance explains only a tiny fraction of the prediction accuracy; (2) Portability trends can be trait-specific, due to underappreciated factors; (3) The measure of prediction accuracy can alter qualitative trends of portability.
(4/27)
We analyzed the prediction accuracy of polygenic scores as a continuous function of individuals' dissimilarity to the GWAS sample (genetic distance), and contrasted them with the frequently used group-level accuracy measures.
(3/27)
The broad adoption of PGS is hindered by their limited portability to people that differβin genetic ancestry or other characteristicsβfrom the GWAS samples used to construct them.
Some relevant studies include:
rdcu.be/e0Qxj
elifesciences.org/articles/48376
rdcu.be/e0Qxz
(2/27)
Our work on the generalizability of polygenic scores (PGS) from the @arbelharpak.bsky.social Lab is now officially out!
We examine the accuracy of PGS predictions at the individual level. We make 3 observations that expose gaps in our understanding of PGS βportability.β
rdcu.be/e0LAr
(1/27)