Joyce Wang (@joyce-yiyi-wang)

Also, many thanks to the generous colleagues who provided feedback on the paper!

(27/27)

27.01.2026 00:22 👍 2 🔁 0 💬 0 📌 0

I would like to thank my advisor @arbelharpak.bsky.social, as well as co-authors Neeka, Michael, Jason, @oliviarxiv.bsky.social and Paul!

We note with sadness that Paul passed away when our paper was accepted. His contribution was one of many random acts of kindness that characterized him.

(26/27)

26.01.2026 23:20 👍 5 🔁 0 💬 1 📌 0

We discuss future directions stemming from our 3 observations, such as moving beyond the analysis of ancestry groupings alone; considering trait evolution, genetic architecture; and evaluating predictive performance in metrics that correspond to the intended application.

(25/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

All told, our results suggest there are understudied factors contributing to the portability problem.

(24/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

We found some interesting examples. For asthma, precision and recall depend on genetic distance in a qualitatively similar way. For T2D, precision appears roughly constant for medium and large genetic distances, while recall generally increases with genetic distance.

(23/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

For example, when a broadly accessible, safe intervention exists, we might prioritize the identification of all cases over all else and therefore focus on recall. When the intervention includes a risky treatment, we might prioritize avoiding false positives and hence focus on precision.

(22/27)

26.01.2026 23:20 👍 1 🔁 0 💬 1 📌 0

But more generally, the fact conclusions substantially depend on the measure of prediction accuracy suggests to us that more attention should perhaps be given to the measures relevant to the intended application.

(21/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

So individual-level measures can yield qualitatively different portability trends than the group-level measures that are widely used.

(20/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

(Observation 3) Qualitative trends of portability can depend on the measure of prediction accuracy used. For some traits our measure of group-level prediction accuracy often drops where at the individual level, it increases.

(19/27)

26.01.2026 23:20 👍 5 🔁 1 💬 1 📌 0

Together, the poor portability of effect sizes and the increase in heterozygosity of large effect index SNPs make for a PGS that is highly variable + a poor predictor away from the GWAS sample, likely leading to the near-zero partial correlation between PGS and trait.

(18/27)

26.01.2026 23:20 👍 3 🔁 0 💬 1 📌 0

As a result of the trends of heterozygosity, the variance in the polygenic score quickly increases with genetic distance for white blood cell count, lymphocyte count, and monocyte count, despite decreasing for the remaining 12 continuous traits we have examined.

(17/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

Relatedly, some cool recent work from @roshnipatel.bsky.social, Jeffrey Spence, @jkpritch.bsky.social et al. dives deep into expectations, based on models of natural selection, for allele frequency in a group B conditional on allele frequency in group A.

academic.oup.com/genetics/art...

(16/27)

26.01.2026 23:20 👍 6 🔁 2 💬 1 📌 0

Indeed, for lymphocyte count, the heterozygosity of large-effect variants increases with genetic distance from the GWAS sample.

(15/27)

26.01.2026 23:20 👍 3 🔁 0 💬 1 📌 0

If causal effects on lymphocyte count change rapidly, then large-effect index SNPs may be under weaker selective constraint in the prediction sample than in the GWAS sample and segregate at high allele frequencies.

(14/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

Allelic effect estimates for lymphocyte count port poorly, even close to the GWAS sample–on average, they shrink by more than 50%, and with large variance across index SNPs. A good comparison is triglyceride levels, a trait of similar SNP heritability in the GWAS sample.

(13/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

To test this prediction, we estimated the allelic effects of index SNPs (SNPs included in the PGS, ascertained in the original GWAS sample) in the GWAS set and subsets of the prediction sample, "close", and "far" (in terms of genetic distance from the GWAS sample).

(12/27)

26.01.2026 23:20 👍 1 🔁 0 💬 1 📌 0

One factor that we considered is the rapid turnover of selective pressures on the immune system across time and geography. We hypothesized that these would lead to less portable genetic associations (across ancestry) in immune-associated loci compared to other traits.

(11/27)

26.01.2026 23:20 👍 1 🔁 0 💬 1 📌 0

Why would portability trends be trait-specific? There are many possible drivers. In some cases, the specifics of trait evolution and genetic architecture, environmental and social context, and PGS construction seem to matter.

(10/27)

26.01.2026 23:20 👍 3 🔁 0 💬 1 📌 0

(Observation 2) Trends of portability vary across traits. For example, for height, individual-level prediction error decays nearly monotonically with genetic distance. For some immunity-related traits like white blood cell count, prediction error increases at large genetic distances.

(9/27)

26.01.2026 23:20 👍 3 🔁 0 💬 1 📌 0

Ding et al. focused on the relationship between genetic distance and the prediction interval, i.e., expected uncertainty in prediction under an assumed theoretical model. Our focus here is instead the relationship with *realized* prediction accuracy.

(8/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

Polygenic scoring accuracy varies across the genetic ancestry continuum Nature - Using two large biobank datasets, a study shows that the accuracy of polygenic scores decreases as a function of relatedness at the individual level when modelling genetic ancestry as a...

We would like to point out excellent work by Yi Ding, @bpasaniuc.bsky.social et al. which relatedly analyzes attributes of polygenic scores as a function of genetic distance from the GWAS sample.

rdcu.be/e0QzA

(7/27)

26.01.2026 23:20 👍 1 🔁 0 💬 1 📌 0

In fact, variance in prediction accuracy is explained comparably well by measures of socioeconomic status (where the average trend is worse prediction for individuals of lower SES).

(6/27)

26.01.2026 23:20 👍 3 🔁 0 💬 1 📌 0

(Observation 1) Previous work suggested genetic distance from the GWAS sample largely explains variation in PGS performance. However, what we see empirically is that prediction accuracy is extremely noisy at the individual level.

(5/27)

26.01.2026 23:20 👍 4 🔁 1 💬 1 📌 0

We discuss 3 key observations: (1) Genetic distance explains only a tiny fraction of the prediction accuracy; (2) Portability trends can be trait-specific, due to underappreciated factors; (3) The measure of prediction accuracy can alter qualitative trends of portability.

(4/27)

26.01.2026 23:20 👍 3 🔁 1 💬 1 📌 0

We analyzed the prediction accuracy of polygenic scores as a continuous function of individuals' dissimilarity to the GWAS sample (genetic distance), and contrasted them with the frequently used group-level accuracy measures.

(3/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 0

The broad adoption of PGS is hindered by their limited portability to people that differ—in genetic ancestry or other characteristics—from the GWAS samples used to construct them.

Some relevant studies include:

rdcu.be/e0Qxj
elifesciences.org/articles/48376
rdcu.be/e0Qxz

(2/27)

26.01.2026 23:20 👍 2 🔁 0 💬 1 📌 1

Three open questions in polygenic score portability Nature Communications - Genetic predictors of health outcomes often drop in accuracy when applied to people dissimilar to participants of large genetic studies. Here, the authors investigate the...

Our work on the generalizability of polygenic scores (PGS) from the @arbelharpak.bsky.social Lab is now officially out!

We examine the accuracy of PGS predictions at the individual level. We make 3 observations that expose gaps in our understanding of PGS “portability.”

rdcu.be/e0LAr

(1/27)

26.01.2026 23:20 👍 32 🔁 16 💬 2 📌 1

Joyce Wang

Latest posts by Joyce Wang @joyce-yiyi-wang