anpaure's Avatar

anpaure

@anpau.re

research

489
Followers
1,385
Following
24
Posts
17.11.2024
Joined
Posts Following

Latest posts by anpaure @anpau.re

wahhh wahhh

29.11.2024 09:12 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I wanted to change the format of the eval before the next model drop since it's not very rigorous and I cane up with many improvements, but Qwen had other plans. As always, link to the repo here.
github.com/anpaure/cp_e...

28.11.2024 13:58 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

How does the new Qwen model compare to other LLMs on coding tasks?
It's impressive, but rushed.
I ran it against other SOTA models on 6 competitive programming problems of varying difficulties.
Here are the results!

28.11.2024 13:57 πŸ‘ 24 πŸ” 1 πŸ’¬ 3 πŸ“Œ 0

my goat, i'm glad someone made it right πŸ™πŸ™πŸ™

27.11.2024 21:35 πŸ‘ 19 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

i'm very disappointed that people are reacting this way, especially considering what huggingface stands for
i also believe it's especially hard to reprogram the "ai = bad" messaging that's floating around for a while now so stay safe out there

27.11.2024 18:35 πŸ‘ 16 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

thinking this platform isn't gonna be toxic is extremely wishful thinking tbh
i mean you're moving the same people from twitter to here, so yeah nothing is gonna change
i personally maintain that twitter is Not That Bad Actually because there was no defined ingroup and outgroup
here there is

27.11.2024 17:34 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

honestly it would look bad in all cases because the timing is ruined, but it doesn't even work theoretically with twos and threes

27.11.2024 01:02 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

clearly you don't watch animation...
not hard to imagine how messed up stuff animated on anything but ones would look

26.11.2024 23:16 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

when her replies get shorter and colder

26.11.2024 15:27 πŸ‘ 9 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

very weird why this is not default behavior

25.11.2024 14:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

i think i might cry if i have to level up from lowbie again on twitter
or maybe it will be fun, idk

24.11.2024 22:41 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

you're definitely not including people who had an account here a while ago

24.11.2024 11:54 πŸ‘ 5 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

I added a result interpretation section and breakdowns for each problem to guide everyone more clearly through what's going on in each problem.

21.11.2024 18:30 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

for 2 it's def an echo chamber, on twitter both the left and the right called each other stupid and to me that was beautiful
there's no reason for the right to migrate here and anything that's against the "narrative" gets blocked pretty quickly (which also happened on twitter but it somehow worked)

21.11.2024 16:49 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Despite a small sample size I still think it's very helpful to examine closely what the models can do and where they fail.
Here's the link to the GitHub repo, it's recommended to look at it through Colab.
github.com/anpaure/cp_e...

21.11.2024 16:27 πŸ‘ 8 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

How smart is the new DeepSeek model at coding problems?
Almost o1 level actually.
Today I sat down and ran a couple of competitive programming problems of varying difficulty on leading LLMs, like o1, 4o, Sonnet 3.6 and DeepSeek R1.
These are the preliminary results on 6 problems!

21.11.2024 16:24 πŸ‘ 21 πŸ” 2 πŸ’¬ 1 πŸ“Œ 0
Preview
Proclamation 10043 - Wikipedia

didn't know about it until yesterday too but apparently it's a presidential proclamation signed by trump that prohibits cn students associated with people's liberation army from getting f and j visas
in the end it mostly affected students of 8 major colleges in china
en.wikipedia.org/wiki/Proclam...

21.11.2024 13:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

years of alignment research down the drain😭

21.11.2024 13:08 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

it's an offshoot of a chinese quant company called high-flyer, there's some info you can read online
they've been previously very under the radar but recently even employees started posting about their work on models on twitter
some people say that it's the direct consequence of 10043, quite sad

21.11.2024 12:45 πŸ‘ 6 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

true but like records aren't affecting anything and they're just hyperloglog (so very light too)
follows are more important and affect a bunch of things (couldn't tell you what exactly), there's probably a reason why they were capped on twitter and other social media

20.11.2024 10:02 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

i've been wondering about that since day 1 here, there's no way this feature doesn't get removed/hardcapped or the website doesn't collapse under its own weight
also no ads is crazy, i'm glad we'll have nice things for a bit but i'm afraid it's not for long

19.11.2024 23:46 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

linguistics question here: is there a minimal basis of words that are sufficient to define all other words how many words are enough?

19.11.2024 23:37 πŸ‘ 4 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Post image
18.11.2024 08:54 πŸ‘ 10 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1

i tried to think for a bit about how you would fermi estimate this and then kinda gave up because it's really difficult

17.11.2024 14:36 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0