Alicia Curth's Avatar

Alicia Curth

@aliciacurth

Machine Learner by day, ๐Ÿฆฎ Statistician at โค๏ธ In search of statistical intuition for modern ML & simple explanations for complex things๐Ÿ‘€ Interested in the mysteries of modern ML, causality & all of stats. Opinions my own. https://aliciacurth.github.io

2,056
Followers
275
Following
63
Posts
17.11.2024
Joined
Posts Following

Latest posts by Alicia Curth @aliciacurth

Honestly hurts my feelings a little that I didnโ€™t even make this list ๐Ÿฅฒ๐Ÿฅฒ

22.11.2024 21:12 ๐Ÿ‘ 7 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0

This is what I came to this app for ๐Ÿฆฎ

21.11.2024 16:56 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Thank you for sharing!! Sounds super interesting, so will definitely check it out :)

21.11.2024 15:58 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Exactly this!! thank you ๐Ÿค—

21.11.2024 13:47 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Oh exciting! On which one? :)

21.11.2024 10:21 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

To be fair, itโ€™s actually a really really good TLDR!! Iโ€™m honestly just a little scared this will end up on the wrong side of twitter now ๐Ÿ˜ณ

21.11.2024 10:21 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition.

Now might be the worst possible point in time to admit that I donโ€™t own a physical copy of the book myself (yet!! Iโ€™m actually building up a textbook bookshelf for myself) BUT because Hastie, Tibshirani & Friedman are the GOATs that they are, they made the pdf free: hastie.su.domains/ElemStatLearn/

21.11.2024 09:27 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition.

Now might be the worst possible point in time to admit that I donโ€™t own a physical copy of the book myself (yet!! Iโ€™m actually building up a textbook bookshelf for myself) BUT because Hastie, Tibshirani & Friedman are the GOATs that they are, they made the pdf free: hastie.su.domains/ElemStatLearn/

21.11.2024 09:27 ๐Ÿ‘ 6 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Oh friends who are complaining about not enough Real Math^tm in their feed, I am here to help. Well, Alicia is here to help, at least!

21.11.2024 04:42 ๐Ÿ‘ 5 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

To emphasise just how accurately that reflects Alanโ€™s approach to research (which I 100% subscribe to btw), I feel compelled to share that this is the actual slide I use whenever I present the U-turn paper in Alanโ€™s absence ๐Ÿ˜‚ (not a joke)

20.11.2024 22:01 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Now continued below with case study 2: understanding performance differences of neural networks and gradient boosted trees on irregular tabular data!!

20.11.2024 21:12 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

btw this is why friends dont let friends skip the โ€œboring classical MLโ€ chapters in Elements of Statistical Learningโ€ผ๏ธ

(True story: the origin of this case study is that @alanjeffares.bsky.social[big EoSL nerd] looked at the neural net eq&said โ€œkinda looks like GBTs in EoSL Ch10โ€&we went from there)

20.11.2024 20:47 ๐Ÿ‘ 41 ๐Ÿ” 4 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 1

Thereโ€™s one more case study & thoughts on the effect of design choices on function updates leftโ€” Iโ€™ll cover that in a final thread! (next week, giving us all a break๐Ÿ˜…)

Until then, find the paper here arxiv.org/abs/2411.00247

and/or recap part 1 of this thread below! ๐Ÿค— 14/14

20.11.2024 17:01 ๐Ÿ‘ 4 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Thus in conclusion this 2nd case study showed that the telescoping approximation of a trained neural network can be a useful lens to investigate performance diffs with other methods!

Here we used it to show how some perf diffs are predicted by specific model diffs(ie diffs in implied kernels)๐Ÿ’ก13/n

20.11.2024 17:01 ๐Ÿ‘ 2 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Importantly, this growth in performance gap is tracked by the behaviour of the modelsโ€™ kernels:

while there is no difference in kernel weights for GBTs across different input irregularity levels, the neural netโ€™s kernel weights for the most irregular ex grow more extreme! 12/n

20.11.2024 17:01 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

We test this hypothesis by varying the proportion of irregular inputs in the testset for fixed trained models.

We find that GBTs outperform NNs already in the absence of irregular ex; this speaks to diff in baseline suitability

The performance gap then indeed grows as we increase irregularity!11/n

20.11.2024 17:01 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

This highlights a potential explanation why GBTs outperform neural nets on tabular data in the presence of input irregularities:

The kernels implied by the neural network might behave much much more unpredictably for test inputs different to inputs observed at train time! ๐Ÿ’ก๐Ÿค”10/n

20.11.2024 17:01 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Trees issue preds that are proper averages: all kernel weights are between 0 & 1. That is: trees never โ€œextrapolateโ€ from the convex hull of training observations ๐Ÿ’ก

Neural net tangent kernels OTOH are generally unbounded and could take on very different vals for unseen test inputs!๐Ÿ˜ฐ 9/n

20.11.2024 17:01 ๐Ÿ‘ 3 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

One diff is obvious and purely architectural: either kernel might be able to better fit a particular underlying outcome generating process!

A second diff is a lot more subtle and relates to how regular (or: predictable) the two will likely behave on new data: โ€ฆ 8/n

20.11.2024 17:01 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

but WAIT A MINUTE โ€” isnโ€™t that literally the same formula as the kernel representation of the telescoping model of a trained neural network I showed you before?? Just with a different kernel??

Surely this diff in kernel must account for at least some of the observed performance differencesโ€ฆ ๐Ÿค”7/n

20.11.2024 17:01 ๐Ÿ‘ 6 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

Gradient boosted trees (aka OG gradient boosting) simply implement this process using trees!

From our previous work on random forests(arxiv.org/abs/2402.01502) we know we can interpret trees as adaptive kernel smoothers, so we can rewrite the GBT preds as weighted avgs over training loss grads!6/n

20.11.2024 17:01 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Quick refresher: what is gradient boosting?

Not to be confused with other forms of boosting (eg Adaboost), *Gradient* boosting fits a sequence of weak learners that execute steepest descent in function space directly by learning to predict the loss gradients of training examples! 5/n

20.11.2024 17:01 ๐Ÿ‘ 4 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

In arxiv.org/abs/2411.00247 we ask: why? What distinguishes gradient boosted trees from deep learning that would explain this?

A first reaction might be โ€œthey are SO different idk where to start ๐Ÿ˜ญโ€ โ€” BUT we show that through the telescoping lens (see part 1 of this๐Ÿงตโฌ‡๏ธ) things become more clear..4/n

20.11.2024 17:01 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1
Post image

And you know who continues to rule the tabular benchmarks? Gradient boosted trees (GBTs)!!(or their descendants)

While the severity of the perf gap over neural nets is disputed, arxiv.org/abs/2305.02997 still found as recently as last year that GBTs esp outperform when data is irregular! 3/n

20.11.2024 17:01 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

First things first, why do we care about tabular?

Deep learning sometimes seems to forget we used to do data formats that werenโ€™t text or image (๐Ÿ˜‰) BUT in data science applications โ€” from medicine to marketing and econ โ€” tabular data still rules big parts of the world!!
2/n

20.11.2024 17:01 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Part 2: Why do boosted trees outperform deep learning on tabular data??

@alanjeffares.bsky.social & I suspected that answers to this are obfuscated by the 2 being considered very different algs๐Ÿค”

Instead we show they are more similar than youโ€™d think โ€” making their diffs smaller but predictive!๐Ÿงต1/n

20.11.2024 17:01 ๐Ÿ‘ 71 ๐Ÿ” 10 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 3

No need to leap at all, my original description even had the word "delight" in it!!

20.11.2024 16:40 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Wow, I love that! ๐Ÿ˜

20.11.2024 14:57 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Thank you!! I donโ€™t think I know empirical fisher actually โ€” do you have a ref?

20.11.2024 13:50 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

It was hard to fit, but I gave it my best shot! โœจgolden retriever energy in stats โœจ might be historically underrepresented but I say that can and should change with us

20.11.2024 13:21 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0