Hamed Shirzad's Avatar

Hamed Shirzad

@hamedshirzad

PhD student in Computer Science at UBC | Exploring Machine Learning on Graphs https://www.hamedshirzad.com/

846
Followers
432
Following
40
Posts
07.07.2023
Joined
Posts Following

Latest posts by Hamed Shirzad @hamedshirzad

Post image

Enjoyed giving our tutorial on Geometric & Topological Deep Learning at IEEE MLSP 2025 alongside @semihcanturk.bsky.social. Loving the Istanbul vibes and the amazing food here! βœ…

31.08.2025 21:02 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
TxPert: Predicting Cellular Responses to Unseen Genetic Perturbations - Valence Labs We introduce TxPert: a state-of-the-art model that leverages multiple biological knowledge networks to accurately predict transcriptional responses under OOD scenarios.

1/ Introducing TxPert: a new model that predicts transcriptional responses across diverse biological contexts

It’s designed to generalize across unseen single-gene perturbations, novel combinations of gene perturbations, and even new cell types 🧡

www.valencelabs.com/txpert-predi...

22.05.2025 13:51 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1
Post image

1/ At Valence Labs, @recursionpharma.bsky.social's AI research engine, we’re focused on advancing drug discovery outcomes through cutting-edge computational methods

Today, we're excited to share our vision for building virtual cells, guided by the predict-explain-discover framework 🧡

20.05.2025 15:53 πŸ‘ 13 πŸ” 6 πŸ’¬ 2 πŸ“Œ 2

Loved working on the TxPert project! It's also exciting to see my PhD work on Graph Transformers (Exphormer model) finding such meaningful application in a critical real-world task.

20.05.2025 23:39 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

πŸ”— Read their paper: lnkd.in/gECWw_tR
🧡 Or check out a great summary thread from Yi (Joshua): bsky.app/profile/josh...
If you're attending ICLR, take a visit to their poster and talk:
πŸ“ Poster Hall 3+2B #376 on Fri, Apr 25 at 15:00
🎀 Oral in Session 6A on Sat, Apr 26 at 16:30

23.04.2025 12:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Announcing the Outstanding Paper Awards at ICLR 2025 – ICLR Blog

Huge congrats to my labmate @joshuaren.bsky.social and my supervisor @djsutherland.ml for receiving an Outstanding Paper Award at @iclr-conf.bsky.social for their work: "Learning Dynamics of LLM Finetuning" πŸ†

So proud to see their amazing research recognized! πŸ‘πŸ”₯
blog.iclr.cc/2025/04/22/a...

23.04.2025 12:04 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Sp_Exphormer/Attention Score Analysis.ipynb at main Β· hamed1375/Sp_Exphormer Even Sparser Graph Transformers. Contribute to hamed1375/Sp_Exphormer development by creating an account on GitHub.

There’s more in the paper: arxiv.org/abs/2411.16278 (Appendix F)

We’d love to see anyone do more analysis of these things! To get you started, our scores are available from the "Attention Score Analysis" notebook in our repo:
github.com/hamed1375/Sp...

12.12.2024 00:33 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

How much do nodes attend to graph edges, versus expander edges or self-loops?

On the Photo dataset (homophilic), attention mainly comes from graph edges. On the Actor dataset (heterophilic), self-loops and expander edges play a major role.

12.12.2024 00:32 πŸ‘ 4 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Q. Is selecting the top few attention scores effective?

A. Top-k scores rarely cover the attention sum across nodes, unless the graph has a very small average degree. Results are consistent for both dim=4 and dim=64.

12.12.2024 00:31 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Q. How similar are attention scores across layers?

A. In all experiments, the first layer's attention scores differed significantly, but scores were very consistent for all the other layers.

12.12.2024 00:31 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Q. How do attention scores change across layers?

A. The first layer consistently shows much higher entropy (more uniform attention across nodes), while deeper layers have sharper attention scores.

12.12.2024 00:30 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

We trained 100 single-head Transformers (masked for graph edges w/ and w/o expander graphs + self-loops) on Photo & Actor, with hidden dims 4 to 64.

Q. Are attention scores consistent across widths?

A. The distributions of where a node attends are pretty consistent.

12.12.2024 00:29 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

As a reminder, we will have our poster session tomorrow:

πŸ“ East Exhibit Hall, Poster #3010
πŸ“„ arxiv.org/abs/2411.16278
πŸ’» github.com/hamed1375/Sp...
To motivate you further, we have some insights gained from the attention score analysis of this work, which I'll share in this thread:

12.12.2024 00:29 πŸ‘ 6 πŸ” 1 πŸ’¬ 1 πŸ“Œ 1
Preview
Structure-based drug design with equivariant diffusion models - Nature Computational Science This work applies diffusion models to conditional molecule generation and shows how they can be used to tackle various structure-based drug design problems

After two years, our paper on generative models for structure-based drug design is finally out in @natcomputsci.bsky.social

www.nature.com/articles/s43...

09.12.2024 14:00 πŸ‘ 164 πŸ” 37 πŸ’¬ 2 πŸ“Œ 0
Post image

🚨 Come chat with us at NeurIPS next week! 🚨
πŸ—“οΈ Thursday, Dec 12
⏰ 11:00 AM–2:00 PM PST
πŸ“ East Exhibit Hall A-C, Poster #3010
πŸ“„ Paper: arxiv.org/abs/2411.16278
πŸ’» Code: github.com/hamed1375/Sp...
See you there! πŸ™Œβœ¨
[13/13]

05.12.2024 20:20 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
A Theory for Compressibility of Graph Transformers for Transductive Learning Transductive tasks on graphs differ fundamentally from typical supervised machine learning tasks, as the independent and identically distributed (i.i.d.) assumption does not hold among samples. Instea...

For more on the compression results see our workshop paper β€œA Theory for Compressibility of Graph Transformers for Transductive Learning”; there will be a thread on this too!
Workshop paper link: arxiv.org/abs/2411.13028

05.12.2024 20:20 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We have theoretical guarantees, too: on compression (smaller nets can estimate the attention scores), and that sparsification works even from an approximate attention matrix (from the narrow net).
[11/13]

05.12.2024 20:19 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Downsampling the edges and regular-degree calculations can make you even faster and more memory efficient than GCN!

05.12.2024 20:19 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

But we can scale to graphs Exphormer couldn’t even dream of:

05.12.2024 20:19 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

How much accuracy do we lose compared to an Exphormer with many more edges (and way more memory usage)? Not much.

05.12.2024 20:18 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Now, with sparse (meaningful) edges, k-hop sampling is feasible again even across several layers. Memory and runtime can be traded off by choosing how many β€œcore nodes” we expand from.
[7/13]

05.12.2024 20:18 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

By sampling a regular degree, graph computations are much more efficient (simple batched matmul instead of needing a scatter operation). Naive implementations of sampling can also be really slow, but reservoir sampling makes resampling edges per epoch no big deal.

05.12.2024 20:17 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

Now, we extract the attention scores from the initial network, and use them to sample a sparse attention graph for a bigger model. Attention scores vary on each layer, but no problem: we sample neighbors per layer. Memory usage plummets!
[5/13]

05.12.2024 20:16 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image Post image

But not all the edges matter – if we know which won’t be used, we can just drop them and get a sparser graph/smaller k-hop neighborhoods. It turns out a small network (same arch, tiny hidden dim, minor tweaks) can be a really good proxy for which edges will matter!
[4/13]

05.12.2024 20:15 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

For very large graphs, though, even very simple GNNs need batching. One way is k-hop neighborhood selection, but expander graphs are specifically designed so that k-hop neighborhoods are big. Other batching approaches can drop important edges and kill the advantages of GT.
[3/13]

05.12.2024 20:15 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

Our previous work, Exphormer, uses expander graphs to avoid the quadratic complexity of full GTs.

05.12.2024 20:13 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

🚨 Come chat with us at NeurIPS next week! 🚨
πŸ—“οΈ Thursday, Dec 12
⏰ 11:00 AM–2:00 PM PST
πŸ“ East Exhibit Hall A-C, Poster #3010
πŸ“„ Paper: arxiv.org/abs/2411.16278
πŸ’» Code: github.com/hamed1375/Sp...
See you there! πŸ™Œβœ¨

05.12.2024 20:07 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
A Theory for Compressibility of Graph Transformers for Transductive Learning Transductive tasks on graphs differ fundamentally from typical supervised machine learning tasks, as the independent and identically distributed (i.i.d.) assumption does not hold among samples. Instea...

For more on the compression results see our workshop paper β€œA Theory for Compressibility of Graph Transformers for Transductive Learning”; there will be a thread on this too!
Workshop paper link: arxiv.org/abs/2411.13028

05.12.2024 20:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

We have theoretical guarantees, too: on compression (smaller nets can estimate the attention scores), and that sparsification works even from an approximate attention matrix (from the narrow net).
[11/13]

05.12.2024 20:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

Downsampling the edges and regular-degree calculations can make you even faster and more memory efficient than GCN!

05.12.2024 20:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0