#sparsity

@deepseek.activitypub.awakari.com.ap.brid.gy

1 month ago

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models The key idea

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models The key idea The key idea Offloading local dependencies between tokens with lookups to a massive embedding t...

#memory #sparsity #LLM

Origin | Interest | Match

0 0 0 0

Edgy4sst (2024)

@edgy4sst.bsky.social

2 months ago

matching a #Sparsity #Vector

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

3 months ago

Cardinality Sparsity: Applications in Matrix-Matrix Multiplications and Machine Learning

Ali Mohaddes, Johannes Lederer

Action editor: Pan Xu

https://openreview.net/forum?id=zoSRSpGu9C

#sparse #tensor #sparsity

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Single-Layer Attention Beats Linear Models on Sparse Tokens

A single‑layer attention model can detect rare signals in long sequences with signal strength growing only logarithmically with length L, while linear classifiers need sqrt(L). Read more: getnews.me/single-layer-attention-b... #attention #sparsity

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Spiking Neural Networks Naturally Sparse Gradients Enhance Robustness

Researchers find spiking neural network designs produce sparse gradients, giving robustness without regularization, reducing generalization on clean data. Read more: getnews.me/spiking-neural-networks-... #spikingneuralnetworks #sparsity

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

TASO: Task-Aligned Sparse Optimization for Efficient Fine‑Tuning

TASO outperforms standard LoRA even with a parameter budget comparable to LoRA rank = 1, trimming unnecessary LoRA weights for more efficient fine‑tuning. Read more: getnews.me/taso-task-aligned-sparse... #taso #lora #sparsity

0 0 0 0

GetNews.me

@getnews-me.bsky.social

5 months ago

Sparse FedAdam Reduces Communication Overhead in Federated Learning

FedAdam‑SSM applies a mask to model updates, cutting uplink traffic to about one‑third of FedAdam and achieving 1.1× faster convergence with 14.5% higher accuracy than quantized variants. getnews.me/sparse-fedadam-reduces-c... #fedadam #sparsity

0 0 0 0

Gautam

@gautammalik.bsky.social

9 months ago

[9/9]

Appreciate any advice, pointers to relevant papers, or even “don’t do this” cautionary tales.
Thanks in advance!

#transformers #sparsity #maskedmodeling #deeplearning #symbolicAI #mlresearch #attentionmodels #structureddata

1 0 0 0

Mingxue (Mercy) Xu

@mercyxu.bsky.social

11 months ago

Slightly lazy but feel need to post this in case it is too late... We will present this in the ICLR Workshop on Sparsity in LLMs (SLLM)! We found that the representation dimension can dominate the model performance in the structured pruning 🤯
#ICLR2025 #LLM #sparsity

4 0 2 1

ekaddo

@ekaddo.bsky.social

1 year ago

Apple researchers reveal the secret sauce behind DeepSeek AI The AI model that shook the world is part of a broad trend to squeeze more out of chips using what's called sparsity.

“ #Sparsity is a kind of magic dial that finds the best match of the #AImodel you've got and the compute you have available.
It's the same economic rule of thumb…of personal computers: Either a better result for the same money or the same result for less money.” #AI

www.zdnet.com/article/appl...

2 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 year ago

How DeepSeek did it Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves thr...

https://asiatimes.com/2025/01/how-deepseek-did-it/

#Technology #AI #Sparsity #Anthropic #Claude #3.5 #Artificial #Intelligence #Block #2 #ChatGPT-40

Event Attributes

0 0 0 0

LLMs

@llms.activitypub.awakari.com.ap.brid.gy

1 year ago

How DeepSeek did it Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves thr...

https://asiatimes.com/2025/01/how-deepseek-did-it/

#Technology #AI #Sparsity #Anthropic #Claude #3.5 #Artificial #Intelligence #Block #3 #ChatGPT-40

Event Attributes

0 0 0 0

TMLR Published Papers

@tmlr-pub.bsky.social

1 year ago

PLUM: Improving Inference Efficiency By Leveraging Repetition-Sparsity Trade-Off

Sachit Kuhar, Yash Jain, Alexey Tumanov

Action editor: Hongsheng Li

https://openreview.net/forum?id=IEKtMMSblm

#quantization #imagenet #sparsity

1 0 0 0

Javed A. Butt

@javedab.bsky.social

1 year ago

mistral's 8x22B is ~260GB

the trend is to get models smaller, not bigger

pruning, sparsity, quantization, distillation

so why such a huge model?

does mistral have no other models?

0 0 0 0

Laurent Duval (research amateur)

@laurentduval.bsky.social

2 years ago

Yasuhisa Kuroda released a spectral data processing program for chemical analysis called SPANA eonet.ne.jp/~spana-lsq/i.... With our BEADS algorithm (baseline estimation & denoising w/ #sparsity) to separate peaks, baseline and noise! doi.org/10.1016/j.ch... #analyticalchemistry

0 0 0 0

Posts tagged #sparsity