GitHub - valeoai/peft-ipa
Contribute to valeoai/peft-ipa development by creating an account on GitHub.
8/IPA is just a first step. We believe that a deeper understanding of the feature space is important to unlocking better model adaptation.๐ญ
To try it out and find more details๐:
Arxiv: arxiv.org/abs/2509.04398
Code: github.com/valeoai/peft...
02.12.2025 11:11
๐ 2
๐ 0
๐ฌ 0
๐ 0
7/Across benchmarks, IPA consistently outperforms standard LoRA and DoRA.
๐Commonsense Reasoning: +1.5 points avg accuracy.
๐ผ๏ธVTAB-1k (Vision): +2.3 points avg accuracy.
It is also robust at very low ranks (e.g., r=8) where standard LoRA fails.
02.12.2025 11:11
๐ 1
๐ 0
๐ฌ 1
๐ 0
6/Does it work? Yes. But the real win is parameter efficiency.
Because the IPA projection captures the feature space, we can freeze it during finetuning.
On Llama-2/3 and Qwen-2.5, IPA matches or surpasses full LoRA/DoRA tuning at rank=32 with 50% fewer trainable params.๐โก๏ธ
02.12.2025 11:11
๐ 0
๐ 0
๐ฌ 1
๐ 0
5/Training autoencoders per layer with backprop is expensive. So we use a classic, efficient method: Incremental PCA.
โ
Forward-only (no backprop)
โ
Streaming (no huge memory overhead)
โ
Fast (approx 10 mins for pretraining)
It creates a robust starting point for the adapter.
02.12.2025 11:11
๐ 0
๐ 0
๐ฌ 1
๐ 0
4/We asked๐: What if a random projection isn't the best choice?
We introduce IPA.
The intuition๐ก: The input projection should be reconstructive.
We train the projection to preserve the maximum information from the inputs (like an autoencoder) before the adaptation begins.
02.12.2025 11:11
๐ 0
๐ 0
๐ฌ 1
๐ 0
3/We visualized this by training LoRAs on different tasks with the same init.
The heatmaps tell the story:
1๏ธโฃMatrix A (left) stays frozen near its random start
2๏ธโฃMatrix B (right) adapts to capture task variances
Takeaway: LoRA essentially relies on a fixed, random projection.
02.12.2025 11:11
๐ 0
๐ 0
๐ฌ 1
๐ 0
2/Standard LoRA decomposes updates into two matrices: A (down-projection) and B (up-projection).
Typically at init, A is random and B is zero.
We found a major asymmetry: during training, A remains close to init, while B absorbs almost all the task-specific adaptation.
02.12.2025 11:11
๐ 1
๐ 0
๐ฌ 1
๐ 0
1/Serve your PEFT with a fresh IPA!๐บ
Finetuning large models is cheaper thanks to LoRA, but is its random init optimal?๐ค
Meet IPA: a feature-aware alternative to random projections
#NeurIPS2025 WS #CCFM Oral+Best Paper
Work w/
S. Venkataramanan @tuanhungvu.bsky.social @abursuc.bsky.social M. Cord
๐งต
02.12.2025 11:11
๐ 12
๐ 2
๐ฌ 1
๐ 2
Personnellement, lโannรฉe 2024 a รฉtรฉ marquรฉe par lโespoir, la perplexitรฉ et, enfin, le bonheur. Il est maintenant temps de consolider davantage mon parcours en France. Cheers for 2025.
11.01.2025 23:13
๐ 5
๐ 0
๐ฌ 0
๐ 0
So my Twitter account is reaching a point where it is no longer pushing research-related content from the people that I follow. Only click bait content or hype shit. Time to log out.
10.01.2025 21:29
๐ 2
๐ 0
๐ฌ 0
๐ 0