Trending

#sparsity

Latest posts tagged with #sparsity on Bluesky

Latest Top
Trending

Posts tagged #sparsity

Preview
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models The key idea

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models The key idea The key idea Offloading local dependencies between tokens with lookups to a massive embedding t...

#memory #sparsity #LLM

Origin | Interest | Match

0 0 0 0

matching a #Sparsity #Vector

0 0 0 0

Cardinality Sparsity: Applications in Matrix-Matrix Multiplications and Machine Learning

Ali Mohaddes, Johannes Lederer

Action editor: Pan Xu

https://openreview.net/forum?id=zoSRSpGu9C

#sparse #tensor #sparsity

0 0 0 0
Single-Layer Attention Beats Linear Models on Sparse Tokens

Single-Layer Attention Beats Linear Models on Sparse Tokens

A single‑layer attention model can detect rare signals in long sequences with signal strength growing only logarithmically with length L, while linear classifiers need sqrt(L). Read more: getnews.me/single-layer-attention-b... #attention #sparsity

0 0 0 0
Spiking Neural Networks Naturally Sparse Gradients Enhance Robustness

Spiking Neural Networks Naturally Sparse Gradients Enhance Robustness

Researchers find spiking neural network designs produce sparse gradients, giving robustness without regularization, reducing generalization on clean data. Read more: getnews.me/spiking-neural-networks-... #spikingneuralnetworks #sparsity

0 0 0 0
TASO: Task-Aligned Sparse Optimization for Efficient Fine‑Tuning

TASO: Task-Aligned Sparse Optimization for Efficient Fine‑Tuning

TASO outperforms standard LoRA even with a parameter budget comparable to LoRA rank = 1, trimming unnecessary LoRA weights for more efficient fine‑tuning. Read more: getnews.me/taso-task-aligned-sparse... #taso #lora #sparsity

0 0 0 0
Sparse FedAdam Reduces Communication Overhead in Federated Learning

Sparse FedAdam Reduces Communication Overhead in Federated Learning

FedAdam‑SSM applies a mask to model updates, cutting uplink traffic to about one‑third of FedAdam and achieving 1.1× faster convergence with 14.5% higher accuracy than quantized variants. getnews.me/sparse-fedadam-reduces-c... #fedadam #sparsity

0 0 0 0

[9/9]

Appreciate any advice, pointers to relevant papers, or even “don’t do this” cautionary tales.
Thanks in advance!

#transformers #sparsity #maskedmodeling #deeplearning #symbolicAI #mlresearch #attentionmodels #structureddata

1 0 0 0
Post image

Slightly lazy but feel need to post this in case it is too late... We will present this in the ICLR Workshop on Sparsity in LLMs (SLLM)! We found that the representation dimension can dominate the model performance in the structured pruning 🤯
#ICLR2025 #LLM #sparsity

4 0 2 1
Preview
Apple researchers reveal the secret sauce behind DeepSeek AI The AI model that shook the world is part of a broad trend to squeeze more out of chips using what's called sparsity.

#Sparsity is a kind of magic dial that finds the best match of the #AImodel you've got and the compute you have available.
It's the same economic rule of thumb…of personal computers: Either a better result for the same money or the same result for less money.” #AI

www.zdnet.com/article/appl...

2 0 0 0
Post image

How DeepSeek did it Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves thr...

https://asiatimes.com/2025/01/how-deepseek-did-it/

#Technology #AI #Sparsity #Anthropic #Claude #3.5 #Artificial #Intelligence #Block #2 #ChatGPT-40

Event Attributes

0 0 0 0
Post image

How DeepSeek did it Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves thr...

https://asiatimes.com/2025/01/how-deepseek-did-it/

#Technology #AI #Sparsity #Anthropic #Claude #3.5 #Artificial #Intelligence #Block #3 #ChatGPT-40

Event Attributes

0 0 0 0

PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off

Sachit Kuhar, Yash Jain, Alexey Tumanov

Action editor: Hongsheng Li

https://openreview.net/forum?id=IEKtMMSblm

#quantization #imagenet #sparsity

1 0 0 0

mistral's 8x22B is ~260GB

the trend is to get models smaller, not bigger

pruning, sparsity, quantization, distillation

so why such a huge model?

does mistral have no other models?

0 0 0 0
Post image

Yasuhisa Kuroda released a spectral data processing program for chemical analysis called SPANA eonet.ne.jp/~spana-lsq/i.... With our BEADS algorithm (baseline estimation & denoising w/ #sparsity) to separate peaks, baseline and noise! doi.org/10.1016/j.ch... #analyticalchemistry

0 0 0 0