Yuan Yin's Avatar

Yuan Yin

@yuanyinnn

AI Research Scientist at Valeo.ai | prev. Sorbonne U | https://yuan-yin.github.io | posts in en/fr/zh

115
Followers
134
Following
10
Posts
23.11.2024
Joined
Posts Following

Latest posts by Yuan Yin @yuanyinnn

Preview
GitHub - valeoai/peft-ipa Contribute to valeoai/peft-ipa development by creating an account on GitHub.

8/IPA is just a first step. We believe that a deeper understanding of the feature space is important to unlocking better model adaptation.๐Ÿ’ญ

To try it out and find more details๐Ÿ‘‡:

Arxiv: arxiv.org/abs/2509.04398
Code: github.com/valeoai/peft...

02.12.2025 11:11 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

7/Across benchmarks, IPA consistently outperforms standard LoRA and DoRA.
๐Ÿ“ŠCommonsense Reasoning: +1.5 points avg accuracy.
๐Ÿ–ผ๏ธVTAB-1k (Vision): +2.3 points avg accuracy.
It is also robust at very low ranks (e.g., r=8) where standard LoRA fails.

02.12.2025 11:11 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

6/Does it work? Yes. But the real win is parameter efficiency.

Because the IPA projection captures the feature space, we can freeze it during finetuning.

On Llama-2/3 and Qwen-2.5, IPA matches or surpasses full LoRA/DoRA tuning at rank=32 with 50% fewer trainable params.๐Ÿ“‰โšก๏ธ

02.12.2025 11:11 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

5/Training autoencoders per layer with backprop is expensive. So we use a classic, efficient method: Incremental PCA.

โœ…Forward-only (no backprop)
โœ…Streaming (no huge memory overhead)
โœ…Fast (approx 10 mins for pretraining)

It creates a robust starting point for the adapter.

02.12.2025 11:11 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

4/We asked๐Ÿ”Ž: What if a random projection isn't the best choice?

We introduce IPA.

The intuition๐Ÿ’ก: The input projection should be reconstructive.

We train the projection to preserve the maximum information from the inputs (like an autoencoder) before the adaptation begins.

02.12.2025 11:11 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

3/We visualized this by training LoRAs on different tasks with the same init.

The heatmaps tell the story:
1๏ธโƒฃMatrix A (left) stays frozen near its random start
2๏ธโƒฃMatrix B (right) adapts to capture task variances

Takeaway: LoRA essentially relies on a fixed, random projection.

02.12.2025 11:11 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

2/Standard LoRA decomposes updates into two matrices: A (down-projection) and B (up-projection).

Typically at init, A is random and B is zero.

We found a major asymmetry: during training, A remains close to init, while B absorbs almost all the task-specific adaptation.

02.12.2025 11:11 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

1/Serve your PEFT with a fresh IPA!๐Ÿบ
Finetuning large models is cheaper thanks to LoRA, but is its random init optimal?๐Ÿค”
Meet IPA: a feature-aware alternative to random projections
#NeurIPS2025 WS #CCFM Oral+Best Paper
Work w/
S. Venkataramanan @tuanhungvu.bsky.social @abursuc.bsky.social M. Cord
๐Ÿงต

02.12.2025 11:11 ๐Ÿ‘ 12 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 2

Personnellement, lโ€™annรฉe 2024 a รฉtรฉ marquรฉe par lโ€™espoir, la perplexitรฉ et, enfin, le bonheur. Il est maintenant temps de consolider davantage mon parcours en France. Cheers for 2025.

11.01.2025 23:13 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

So my Twitter account is reaching a point where it is no longer pushing research-related content from the people that I follow. Only click bait content or hype shit. Time to log out.

10.01.2025 21:29 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0