Dmitry Penzar 's Avatar

Dmitry Penzar

@pensarata

PhD student, regulatory genomics, machine learning in biology, algorithms

1,979
Followers
604
Following
22
Posts
12.11.2024
Joined
Posts Following

Latest posts by Dmitry Penzar @pensarata

The key ingredient of our solution was MPRA-LegNet, but we also incorporated a large number of new ideas to master the challenge.

It’s inspiring that the second-place team also used LegNet as the basis for their solution.

More details to come

08.12.2025 03:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Our team achieved first place in the CAGI7 lentiMPRA challenge on predicting the effects of single-nucleotide mutations in regulatory elements, surpassing the nearest competitors by a significant margin.

08.12.2025 03:58 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
GitHub - autosome-ru/ibis-challenge: Repository with source code and metadata for IBIS challenge Repository with source code and metadata for IBIS challenge - autosome-ru/ibis-challenge

(13/13) In turn, the wider set of data for Final TFs remains suitable for offline benchmarking with the open-source bibis framework (github.com/autosome-ru/...). The whole story can be found on bioRxiv: doi.org/10.1101/2025....

18.11.2025 22:54 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
IBIS Challenge

(12/13) The online Leaderboard benchmarking platform, including the preprocessed data, benchmarking protocols, and rich documentation, remains fully functional and accessible online (ibis.autosome.org) to facilitate development of the future TFBS models.

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

(11/13) However, those changes did not translate into better prediction of SNP effects. Additionally, pre-initialization of the first convolutional layers with the best available PWMs for the corresponding TFs didn't yield any notable performance gain.

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

(10/13) We conducted ablation studies on LegNet. Minor modifications, such as replacing global average pooling with global max pooling in the SE block, led to substantial performance gains, making the resulting model the best in the post-challenge assessment.

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

(9/13) Post-challenge analysis added extra DL models: top models from the DREAM challenge and popular architectures unused in IBIS, including Malinois and DNA language models. Fine-tuned DNA LMs performed far worse than fully supervised approaches.

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

(8/13) TF-binding models can be used to predict the effect of single-nucleotide variants. In A2G, PWMs performed unexpectedly well, e.g. MEX secured 2nd place. In G2A, the original top triple-A models dominated, followed by MEX and RSAT β€” the strongest PWM-based approach.

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

(7/13) Yet, several deep learning approaches (DL) failed substantially in cross-experiment validation – in some cases performing far worse than PWMs. Unlocking the full potential of DL clearly requires careful architectural and training design.

18.11.2025 22:54 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Ilya Vorontsov on X: "Our paper on LARGE-scale benchmarking of motif discovery tools is published! https://t.co/jIvipjvqxq It was a long, 7 years long journey, which coordinated efforts of 50+ researchers, proud to be on of them. More results from Codebook about poorly studied TFs are coming soon." / X Our paper on LARGE-scale benchmarking of motif discovery tools is published! https://t.co/jIvipjvqxq It was a long, 7 years long journey, which coordinated efforts of 50+ researchers, proud to be on of them. More results from Codebook about poorly studied TFs are coming soon.

(6/13) Performance of the solutions varied substantially across TFs and experimental platforms. The top-scoring ML models outperformed PWM-based IBIS solutions from the competition and our PWM baseline from Codebook MEX (x.com/VorontsovIE/...).

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

(5/13) Once again, we congratulate the runner-up teams (Medici, Salimov & Frolov lab, callitmagic), and the winners (Bench Pressers, mj, and Biology Impostor) (x.com/halfacrocodi...)

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

(4/13) Participants employed a wide range of methods from classic motif discovery with position-specific weight matrices (PWMs) to arbitrary advanced approaches (triple-As), including CNNs, RNNs, gradient boosting, and even more exotic approaches.

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

(3/13) For the first time, the IBIS Challenge assessed in depth the transferability of DNA motif models from artificial to genomic sequences (A2G), and vice versa (G2A), with rigorous test-train splits, multiple performance metrics, and transparent ranking system.

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Vanja (Ivan Kulakovskiy) on X: "Join the IBIS Challenge: an open competition focused on the computational prediction of transcription factor binding motifs. IBIS aims to advance state-of-the-art methods for Inferring Binding Specificities of human transcription factors from diverse experimental data. (1/12) https://t.co/5DUhweEOy9" / X Join the IBIS Challenge: an open competition focused on the computational prediction of transcription factor binding motifs. IBIS aims to advance state-of-the-art methods for Inferring Binding Specificities of human transcription factors from diverse experimental data. (1/12) https://t.co/5DUhweEOy9

(2/13) TFs orchestrate transcriptional programs by recognizing short DNA motifs. The long-standing goal is to develop reliable models of TFs' DNA binding specificities and avoid biases of particular experimental assays (x.com/halfacrocodi...).

18.11.2025 22:54 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

(1/13) Excited to share the outcome of the IBIS Challenge! The IBIS challenge united dozens of teams across the world in tackling the problem of modeling transcription factor (TF) binding specificity using a diverse collection of experimental datasets for understudied human TFs.

18.11.2025 22:54 πŸ‘ 10 πŸ” 7 πŸ’¬ 1 πŸ“Œ 1
Preview
De-novo promoters emerge more readily from random DNA than from genomic DNA Promoters are DNA sequences that help to initiate transcription. Point mutations can create de-novo promoters, which can consequently transcribe inactive genes or create novel transcripts. We know lit...

Excited / nervous to share the β€œmagnum opus” of my postdoc in Andreas Wagner’s lab!

"De-novo promoters emerge more readily from random DNA than from genomic DNA"

This project is the accumulation of 4 years of work, and lays the foundation for my future group. In short, we… (1/4)

28.08.2025 06:37 πŸ‘ 170 πŸ” 59 πŸ’¬ 4 πŸ“Œ 1
Preview
Design principles of cell-state-specific enhancers in hematopoiesis Screen of minimalistic enhancers in blood progenitor cells demonstrates widespread dual activator-repressor function of transcription factors (TFs) and enables the model-guided design of cell-state-sp...

Out in Cell @cp-cell.bsky.social: Design principles of cell-state-specific enhancers in hematopoiesis
🧬🩸 screen of fully synthetic enhancers in blood progenitors
πŸ€– AI that creates new cell state specific enhancers
πŸ” negative synergies between TFs lead to specificity!
www.cell.com/cell/fulltex...
🧡

08.05.2025 16:06 πŸ‘ 141 πŸ” 58 πŸ’¬ 4 πŸ“Œ 9
Preview
Large-scale discovery of potent, compact and erythroid specific enhancers for gene therapy vectors - Nature Communications This study presents a large-scale enhancer screening approach to optimize gene therapy vectors. A compact, potent, erythroid-specific enhancer used in a therapeutic vector, improved viral titers, tran...

Finally published! We developed an epigenomics to therapeutics screening approach that identifies naturally occurring elements that can titrate expression of transgenes at various levels including single elements stronger than the B-globin LCR. www.nature.com/articles/s41...

09.05.2025 14:15 πŸ‘ 15 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0
Preview
Programmatic design and editing of cis-regulatory elements The development of modern genome editing tools has enabled researchers to make such edits with high precision but has left unsolved the problem of designing these edits. As a solution, we propose Ledi...

Our preprint on designing and editing cis-regulatory elements using Ledidi is out! Ledidi turns *any* ML model (or set of models) into a designer of edits to DNA sequences that induce desired characteristics.

Preprint: www.biorxiv.org/content/10.1...
GitHub: github.com/jmschrei/led...

24.04.2025 12:59 πŸ‘ 115 πŸ” 37 πŸ’¬ 2 πŸ“Œ 3

We share a lot of our ideas, code, datasets (that we spend years sanitizing) early. Often way before we release preprints. We do this so that others can use, build on, improve & even "beat" our approaches. But I want to say a few things about some simple expectations 1/

17.01.2025 17:16 πŸ‘ 90 πŸ” 25 πŸ’¬ 1 πŸ“Œ 5
Preview
Modelling and design of transcriptional enhancers - Nature Reviews Bioengineering Enhancers are genomic elements critical for regulating gene expression. In this Review, the authors discuss how sequence-to-function models can be used to unravel the rules underlying enhancer activit...

We wrote a review article on modelling and design of transcriptional enhancers using sequence-to-function models.

From conventional machine learning methods to CNNs and using models as oracles/generative AI for synthetic enhancer design!

@natrevbioeng.bsky.social

www.nature.com/articles/s44...

28.02.2025 14:45 πŸ‘ 57 πŸ” 32 πŸ’¬ 1 πŸ“Œ 1
Preview
Massively parallel characterization of transcriptional regulatory elements - Nature Lentivirus-based reporter assays for 680,000 regulatory sequences from three cell lines coupled to machine-learning models lead to insights into the grammar of cis-regulatory elements.

Super excited to announce our latest work. On a personal note, it's not an exaggeration to say that blood, sweat, and tears got us to the finish line on this: working w/ an outstanding global team of scientists in Germany, Japan, Russia, and USA responding in >100 pages of complex reviewer comments.

15.01.2025 17:39 πŸ‘ 36 πŸ” 10 πŸ’¬ 2 πŸ“Œ 0
Preview
EXTRA-seq: a genome-integrated extended massively parallel reporter assay to quantify enhancer-promoter communication Precise control of gene expression is essential for cellular function, but the mechanisms by which enhancers communicate with promoters to coordinate this process are not fully understood. While seque...

Finally out! We present EXTRA-seq, a new EXTended Reporter Assay to quantify endogenous enhancer-promoter communication at kb scale!
www.biorxiv.org/content/10.1...
A 🧡about what it can do:
#SynBio #DeepLearning #GeneRegulation

16.12.2024 14:39 πŸ‘ 83 πŸ” 34 πŸ’¬ 5 πŸ“Œ 6

Wonderful.
Just two weeks ago I was explaining to a junior colleague the problem of exaggerated claims in science. This paragraph is exactly what should be printed in place of a user agreement when anybody submits a paper.

07.12.2024 18:11 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
autosome.org


Join us for our next Kipoi Seminar with with Dmitry Penzar,
@pensarata.bsky.social @ autosome.org!
πŸ‘‰LegNet: parameter-efficient modeling of gene regulatory regions using modern convolutional neural network
πŸ“…Wed Dec 4, 5:30pm CET
🧬 kipoi.org/seminar/

29.11.2024 12:56 πŸ‘ 3 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Post image

(1/6) πŸ¦β€πŸ”₯ In IBIS #ibischallenge, we challenged teams from all over the world to decipher the DNA recognition code of human transcription factors. The IBIS Final Conference took place on November 27, 2024. Recordings and slides: disk.yandex.ru/d/82FEnwPn15...

28.11.2024 19:59 πŸ‘ 10 πŸ” 5 πŸ’¬ 1 πŸ“Œ 1
Preview
Single-cell gene expression prediction from DNA sequence at large contexts Human genetic variants impacting traits such as disease susceptibility frequently act through modulation of gene expression in a highly cell-type-specific manner. Computational models capable of predi...

Maybe I've got your idea wrong but there is a plenty of seq2activity models trained or finetuned using sc data
www.biorxiv.org/content/10.1...

www.biorxiv.org/content/10.1...

www.biorxiv.org/content/10.1...

28.11.2024 20:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Mapping enhancer-gene regulatory interactions from single-cell data Mapping enhancers and their target genes in specific cell types is crucial for understanding gene regulation and human disease genetics. However, accurately predicting enhancer-gene regulatory interac...

Excited to share our latest preprint on scE2G – a new model to link enhancers to target genes using single-cell data – with state-of-the-art performance across multiple perturbation benchmarks.

biorxiv.org/cgi/content/...

Read more below!

1/12

25.11.2024 08:26 πŸ‘ 41 πŸ” 20 πŸ’¬ 1 πŸ“Œ 4
Preview
Pooled CRISPR screens with joint single-nucleus chromatin accessibility and transcriptome profiling - Nature Biotechnology MultiPerturb-seq profiles gene expression and chromatin accessibility in single-cell pooled CRISPR screen.

Pooled CRISPR screens with joint single-nucleus chromatin accessibility and transcriptome profiling go.nature.com/4hXER5O

21.11.2024 14:26 πŸ‘ 99 πŸ” 28 πŸ’¬ 0 πŸ“Œ 2