would love your take when you do!
would love your take when you do!
π₯Έ
This is one of my all time favorite papers:
openreview.net/forum?id=ByJ...
It shows that, under fair experimental evaluation, lstms do just as well as a bunch of βimprovementsβ
Iβll find it one sec
Very likely.
π
Fun fact: I recently encountered (well, saw on the news) the only other person named finbarr in Canada Iβve ever seen.
The only issue is, he was an arsonist who set a ton of fires in Edmonton.
Really fun conversation with @natolambert.bsky.social!
This is mckernan! What I thought was a nice neighborhood π
Apparently there *is* another finbar(r) in Alberta.
New homeowner fear unlocked; someone hit and ran my neighborβs garage
I thought that was ai gen at first!
thereβs a type of βnot tryingβ which means not executing at the level of competence of a $XX billion corporation
this is the complaint about eg Google products. Theyβre good! better than most startups! But not βtrillion dollar corporation famed for engineering expertiseβ good.
would also accept Austria
I watched too many ski movies and now am trying to convince my wife we should move to Alaska
building my own mlp implementation from scratch in numpy, including backprop, remains one of the most educational exercises Iβve done
Welcome!
π
Ha, itβs been on my list of todos for a while! Iβm glad someone got to it.
Love this. Very clean implementations of various inference optimizations.
Agreed! Folk knowledge is worth publishing!
I mailed this out like a month ago and just never did the promo π
Force of habit!
Ahh youβre right!
seems like we're seeing convergence in VLM design. most recent models (Pixtral, PaliGemma, etc) are moving away from complex fusion techniques toward simpler approaches
as usual, the bitter lesson holds: better to learn structure than impose it
incompleteideas.net/IncIdeas/Bit...
open source VLMs use relatively little compute compared to what you might expect:
LLaVa: 768 A100 hours
DeepSeek-VL: 61,440 A100 hours
PaliGemma: ~12k A100 hours
(for reference, Stable Diffusion used 150k A100 hours)
what i found interesting: VLMs are way simpler than they first appear. current SOTA is basically:
1. ViT encoder (init from SigLIP/CLIP)
2. pretrained LLM base
3. concat image features with text
4. finetune