Softmax is $1/2$-Lipschitz: A tight bound across all $\ell_p$ norms
Pravin Nair
Action editor: Murat Erdogdu
https://openreview.net/forum?id=6dowaHsa6D
#softmax #lipschitz #norms
Latest posts tagged with #softmax on Bluesky
Softmax is $1/2$-Lipschitz: A tight bound across all $\ell_p$ norms
Pravin Nair
Action editor: Murat Erdogdu
https://openreview.net/forum?id=6dowaHsa6D
#softmax #lipschitz #norms
File:The-Transformer-model-architecture.png Output Probabilities Softmax Linear Add&Norm Feed Forward Add&Norm Multi-Head Attention Add&Norm Masked Multi-Head Attention Positional Encoding Output Embedding ↑ Add&Norm Feed Forward Add&Norm Multi-Head Attention Positional Encoding Input Embedding ↑ Input Output (shifted right)
"The transformer approach it describes has become the main architecture of a wide variety of AI, such as #LargeLanguageModels"
#OutputProbabilities
#Softmax
Linear
#Add&Norm
#FeedForward
#MultiHead Attention
#MaskedMultiHead Attention
#PositionalEncoding
#OutputEmbedding
#FeedForward
Training More Robust Classification Model via Discriminative Loss and Gaussian Noise Injection
Hai-Vy Nguyen, Fabrice Gamboa, Sixin Zhang, Reda CHHAIBI, Serge Gratton, Thierry Giaccone
Action editor: Lei Feng
https://openreview.net/forum?id=RnLfJgvST2
#softmax #robust #classification
Softmax returns with No ID — a dark, fuzzy reflection on disconnection and the music industry’s demands. Chunky guitars and robotic grooves carry a refrain that lingers: “Nothing I do, will ever be good enough for you.”
🎧 Listen & read: www.blackplastic.co.uk/alternative-... #NewMusic #Softmax
Harmonic Loss Trains Interpretable AI Models
David D. Baek, Ziming Liu, Riya Tyagi, Max Tegmark
Action editor: Quanshi Zhang
https://openreview.net/forum?id=ZpSZ7pNoCs
#softmax #representations #trained
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
New #TMLR-Paper-with-Video:
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Gabriel Mongaras, Eric C. Larson
https://tmlr.infinite-conf.org/paper_pages/PHcITOi3vV
#softmax #rnns #attention
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Gabriel Mongaras, Eric C. Larson
Action editor: Lingpeng Kong
https://openreview.net/forum?id=PHcITOi3vV
#softmax #rnns #attention
Is isotropy a good proxy for generalization in time series forecasting with transformers?
Rashed Shelim, Shengzhe Xu, Walid Saad, Naren Ramakrishnan
Action editor: Jacek Cyranka
https://openreview.net/forum?id=iUtDYVQzFq
#softmax #forecasting #embeddings
High-Dimensional Gaussian Process Regression with Soft Kernel Interpolation
Chris L Camaño, Daniel Huang
Action editor: Geoff Pleiss
https://openreview.net/forum?id=U9b2FIjvWU
#softmax #interpolation #interpolating
Probabilities of Chat LLMs Are Miscalibrated but Still Predict Correctness on Multiple-Choice Q&A
Benjamin Plaut, Khanh Xuan Nguyen, Tu Trinh
Action editor: Kamalika Chaudhuri
https://openreview.net/forum?id=E6LOh5vz5x
#softmax #accuracy #language
Toward Linearly Regularizing the Geometric Bottleneck of Linear Generalized Attention
Jiaxu Liu, Xinping Yi, Xiangyu Yin, Yuhang Song, Gaojie Jin, Xiaowei Huang
Action editor: Shuangfei Zhai
https://openreview.net/forum?id=Vpyg3fqXbl
#attention #softmax #memory
What is #softmax and why is it important for machine learning? Check out my refresher tutorial on multiclass classification in neural networks and how you can build your own from scratch in Snap!, (or) your favorite programming language:
snap.berkeley.edu/project?user...
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper, Roland Fernandez, Paul Smolensky, Jianfeng Gao
Action editor: Petar Veličković
https://openreview.net/forum?id=yNiBUc2hMW
#attention #softmax #tasks
Beyond Softmax: New Gradient Bandit Framework Expands Learning
Bandit framework swaps softmax’s independence for nested‑logit models, enabling correlated actions. Authors prove sublinear regret bounds, experiments matching softmax‑based methods. Read more: getnews.me/beyond-softmax-new-gradi... #gradientbandit #softmax
Catnat Function Introduced as Efficient Alternative to Softmax
Catnat provides a binary‑split alternative to softmax, yielding a diagonal Fisher information matrix; benchmarks in graph learning, VAEs and reinforcement learning report test performance. Read more: getnews.me/catnat-function-introduc... #catnat #softmax
Softmax Attention Outperforms Linear in Single-Location Regression
Softmax‑based attention achieves Bayes‑optimal error in single‑location regression, while linear attention cannot, and softmax consistently outperforms it in finite‑sample tests. getnews.me/softmax-attention-outper... #softmax #attention
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #mlst #springer #scicomm #pci #meetkyoto #thedeck #hotelanteroomkyoto
The deadline for submission is September 29th, 2025.
#alife2025 #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #mlst #springer #scicomm #pci #meetkyoto #thedeck #hotelanteroomkyoto
Sign-up form: forms.gle/FDBwX9dcovT1...
#alife2025 #ALIFE #ISAL #AI #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #insilenceai #mlst #springer #scicomm #pci #meetkyoto #thedeck #hotelanteroomkyoto
New Similarity-Distance-Magnitude Activation Improves Model Robustness
Similarity-Distance-Magnitude (SDM) activation adds similarity and distance awareness to softmax, improving robustness to out-of-distribution inputs; submitted on 16 Sep 2025. getnews.me/new-similarity-distance-... #sdm #softmax #robustness
Reach out to us in terms of any query - 2025.alife.org
Thank you. ☺️
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #insilenceai #mlst #springer #scicomm
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #insilenceai #mlst #springer #scicomm
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #insilenceai #mlst #springer #scicomm
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #insilenceai #mlst #springer #scicomm
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #cafedecolombia #wolfram #insilence #insilenceai #mlst #springer #scicomm
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #wolfram #insilence #insilenceai #mlst #springer #scicomm
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #wolfram #insilence #insilenceai #mlst #springer #scicomm
#alife2025 #ALIFE #ISAL #AI #artificiallife #STEM #CrossLabs #CrossCompass #XC #AWS #softmax #koin #SakanaAI #Google #MITPress #wolfram #insilence #insilenceai #mlst #springer #scicomm