Bin Shao's Avatar

Bin Shao

@binshaophy

Broadie. deep learning; synthetic biology; single cell genomics; non-linear dynamics. opinions are my own. Latest work: https://www.biorxiv.org/content/10.1101/2024.12.30.630741v2

391
Followers
357
Following
19
Posts
10.11.2024
Joined
Posts Following

Latest posts by Bin Shao @binshaophy

Post image

We are excited to share GPN-Star, a cost-effective, biologically grounded genomic language modeling framework that achieves state-of-the-art performance across a wide range of variant effect prediction tasks relevant to human genetics.
www.biorxiv.org/content/10.1...
(1/n)

22.09.2025 05:29 πŸ‘ 174 πŸ” 91 πŸ’¬ 4 πŸ“Œ 5

Big congrats, Yunha!

30.04.2025 11:38 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Interesting work on plasmid engineering.

11.02.2025 06:34 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

All NIH study sections canceled indefinitely. This will halt science and devastate research budgets in universities.

22.01.2025 20:46 πŸ‘ 12266 πŸ” 4988 πŸ’¬ 586 πŸ“Œ 1166

This gives me such hope for biodiversity conservation, mammals and future mammalogists! Go young people!! πŸ§ͺ

19.01.2025 06:07 πŸ‘ 180 πŸ” 37 πŸ’¬ 3 πŸ“Œ 0
Preview
Population-level amplification of gene regulation by programmable gene transfer - Nature Chemical Biology Gene regulation in engineered microbial populations is often tuned at individual cell levels. Now, a population-wide amplification system has been devised that expands the dynamic range of plasmid tra...

A new paper from Lingchong You's group develops a cool amplification circuit that expands the dynamic range of plasmid transfer #ChemBio #synbio #microsky

www.nature.com/articles/s41...

10.01.2025 19:42 πŸ‘ 16 πŸ” 6 πŸ’¬ 0 πŸ“Œ 0

Recruiting PhD students: our research covers language model + genomics + systems biology: scholar.google.com/citations?us...
1. Four-year PhD program in Beijing
2. Master's degree required
3. Start date: Sep 2025
Please DM if you are interested.

09.01.2025 03:09 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Two decades of bacterial ecology and evolution in a freshwater lake Nature Microbiology - A 471-metagenome time series from Lake Mendota in Wisconsin, USA, reveals seasonal and decadal shifts in bacterial functional and ecological dynamics, especially in response...

After 24 years of work, I’m thrilled to announce the TYMEFLIES dataset, which comprises metagenomes from Lake Mendota (Madison, WI), collected roughly every 10 days (471 samples) for 20 years! @quendi.bsky.social @robinrohwer.bsky.social

rdcu.be/d5put

A thread…

03.01.2025 11:44 πŸ‘ 245 πŸ” 101 πŸ’¬ 3 πŸ“Œ 3

We deeply appreciate the experimental studies that have made this work possible! Please check our github for more details: github.com/lingxusb/TXp...

05.01.2025 06:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Light-dependent modulation of protein localization and function in living bacteria cells - Nature Communications Bacterial proteins are often recruited to specific subcellular locations to carry out their functions. Here, the authors use the optogenetic CRY2-CIB1 system to re-direct proteins to different subcell...

www.nature.com/articles/s41...

04.01.2025 14:04 πŸ‘ 4 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0
Preview
Google Colab

We hope this work will be a useful tool. Feedback is welcome! Please feel free to try our Colab notebook to predict transcriptomes at (almost) zero cost! It takes about 20 minutes for a genome with 4k genes: colab.research.google.com/drive/1Kd-QI...

04.01.2025 23:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

TXpredict captures variations in gene expression both across different protein functional groups and within the same functional group.

04.01.2025 23:46 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

We further used TXpredict to predict the expression of 3.1M genes across a collection of 900 microbial genomes. Small clusters of ribosomal genes located at the periphery of the tSNE plot of all genes and showed high predicted expressions.

04.01.2025 23:46 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Our model leverages information learned from ESM2 model and basic protein statistics to predict genome-wide gene expression. It achieves an average Spearman correlation of 0.53 in predicting gene expression for bacterial genomes that are not in the training dataset:

04.01.2025 23:46 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Predicting microbial transcriptome using genome sequence We present TXpredict, a transformer-based framework for predicting microbial transcriptomes using annotated genome sequences. By leveraging information learned from a large protein language model, TXp...

Is it possible to get the transcriptome of any sequenced microbe without doing the experiments? Happy to introduce TXpredict, a transcriptome prediction tool that generalizes to novel microbial genomes: www.biorxiv.org/content/10.1...

04.01.2025 23:46 πŸ‘ 7 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

Predicting microbial transcriptome using genome sequence https://www.biorxiv.org/content/10.1101/2024.12.30.630741v1

31.12.2024 18:47 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
Preview
GitHub - lingxusb/EcoVAE Contribute to lingxusb/EcoVAE development by creating an account on GitHub.

9/n We envision EcoVAE will advance biodiversity investigations, especially in under-sampled regions and ultimately support global biodiversity monitoring effortsπŸ™

πŸ’»Codes are publicly available: github.com/lingxusb/Eco...

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

8/n 🧩 EcoVAE can also interpolate missing occurrences. For example: In North America, EcoVAE predictions for Sassafras largely overlapped with iNaturalist records. In South Asia, EcoVAE highlighted a wider distribution of Desmodium, consistent with field surveys.

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

7/n 🌍Where is biodiversity under-sampled? We found that regions with high prediction error overlap with known "darkspots" of biodiversity collection. For example, the highest prediction errors for plants were observed in South Asia, Southeast Asia, the Middle East, and Central Africa.

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

6/n πŸ¦‹EcoVAE isn’t limited to plants. The model generalizes well to other taxa, including butterflies and mammals, showcasing its versatility across ecosystems.

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

5/nπŸ–₯️Remarkably, EcoVAE can predict species distributions even with sparse inputs. With just 20% of input data, it achieved an AUROC of 0.78, effectively identifying the locations of missing genera.

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

4/n🌍 We withheld data from three independent regions to test its generalization. The model reconstructed species distributions effectivelyβ€”even for withheld test regionsβ€”and predicted the location of missing records at genus and species levels.

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

3/n πŸš€We leverage a VAE structure that enables fast and scalable modeling of species distribution patterns. In training, we masked 50% of species records and tasked the model to reconstruct full species distribution, mimicking real-world biodiversity sampling

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

2/n 🌿Biodiversity is under immense pressure. Predicting global species distributions at scale is critical, but traditional species distribution models struggle with massive datasets and interspecies interactions (e.g., >33M records and >127K species of plants)

18.12.2024 01:08 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
A generative deep learning approach for global species distribution prediction Anthropogenic pressures on biodiversity necessitate efficient and highly scalable methods to predict global species distributions. Current species distribution models (SDMs) face limitations with larg...

🌏What happens when generative AI meets ecology? How can we use AI to advance biodiversity exploration and monitoring?

Excited to introduce EcoVAE, a generative approach trained on over 100 million high-quality vouchered records to model global biodiversity

www.biorxiv.org/content/10.1...
1/n🧡

18.12.2024 01:08 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Preprint alert! A thread is coming soon.

17.12.2024 02:00 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image book cover and first page of the preface

book cover and first page of the preface

The third edition of my textbook, Nonlinear Dynamics and Chaos, was published today. You can preview the first 68 pages on Google Books, or take a look at the preface below to see what's new. The main new thing is a chapter on the Kuramoto model! Hope you enjoy it.

16.01.2024 15:55 πŸ‘ 172 πŸ” 30 πŸ’¬ 6 πŸ“Œ 7

Two BioML starter packs now:

Pack 1: go.bsky.app/2VWBcCd
Pack 2: go.bsky.app/Bw84Hmc

DM if you want to be included (or nominate people who should be!)

18.11.2024 17:09 πŸ‘ 119 πŸ” 56 πŸ’¬ 10 πŸ“Œ 11