Oh wow, deepseek is starting to make serious progress on LLMs that offload memory to external storage: github.com/deepseek-ai/...
@doctormiko
Accidental Data Scientist, former mathematician and theoretical computer scientist. Love all the things. Some current and past interests: boardgames, home brewing, coffee, D&D, self-hosting, Argentine tango Dormant blog: https://datacasual.com/
Oh wow, deepseek is starting to make serious progress on LLMs that offload memory to external storage: github.com/deepseek-ai/...
Just finished AoC for the first time (a bit late ok). Thanks @was.tl !
It took him almost 6 years, but @howard.fm finally did it: I made my first contribution to an open source project :D
Google: oh by the way, we have Gemini 2 Flash. AND A REAL-TIME MULTIMODAL API. What?
Iβm wondering: are there #OpenAi folks here?
Happy to report that the mysterious "David Mayer" problem is not anymore. Chat GPT can now David Mayer at your heart content...
Fun fact on my "What is the smallest integer such that its square is larger than 15 and smaller than 35?" test, o1 got it worse than o1 preview, sticking to its 4 answer even after giving the correct definition of integers
I wish βdata drivenβ didnβt mostly mean that the data is being driven
Apparently not just LLMs completely misunderstand the issue...
Python regex TIL: `.` does _not_ match a newline character. If you want to consider all the lines as one big string, use the `re.DOTALL` flag. Also interesting: `re.MULTILINE` flag to make `^` and `$` match start and end respectively of each line docs.python.org/3/library/re...
After witnessing the democratic wreck that the American constitution allows, I am wondering "What did GΓΆdel see?"
π
π
ChatGPT cannot say βDavid Mayerβ. WTF?
Once again I need to thank @howard.fm who inspired me (and taught me a few tricks) to be able to gain a really good intuition around the content of the βAttention is all you needβ paper
Then:
- On π¦: π¦ is so much better than birdsite!
- On birdsite: oh, they only talk about birdsite on π¦!
Now:
- On π¦: BURN AI TO THE GROUND! (also, very interesting AI stuff)
- On birdsite: Lol! Look at π¦ falling!
WHY CAN'T WE HAVE GOOD THINGS?
EDIT: What is the smallest integer such that its square is larger than 15 and **smaller** than 35?
Dammit. Long thread and I get wrong the first post.
1. I suspect that the biggest issue is in _comparing_ numbers rather than tokenisation . Especially when negatives are involved.
2. Prompting and system prompts matter: the fact that AVM tends to wander and getting it wrong way more than 4o is very interesting
3. Yay for QwQ! π (6/6)
I then asked "What about negative numbers?"
- 4o gets it right once β
and another time decided the answer is -4 β
- 4o in AVM decided that 5 and -5 are both solutions βοΈ
- Sonnet 3.5 changed the answer to -4 β
- Opus 3, Gemini-exp-1121 and Gemini-1.5-Pro got it right β
What to make of it?(5/6)
- o1-preview got it right β
- o1-mini got it right β
, but also adds -4 as an alternative π€·
- 4o stubbornly stuck to its gun, adding a cheeky smile β
- 4o in Advanced voice mode changed its answer to 5. βπ€·
- Sonnet 3.5, Opus 3, Gemini-exp-1121, and Gemini 1.5 Pro insisted on 4 β(4/6)
These answered 4 β
- OpenAI o1-preview, o1-mini and 4o
- Anthropic Sonnet 3.5 and Opus 3
- Google Gemini-exp-1121 and Gemini 1.5 Pro
I then asked "what is an integer?" (which they all answered correctly) and then again "do you want to change your original answer?"
The results: (3/6)
QwQ 32B Preview is the only model that got it right out of the box. Most of the times. Sometimes it did not self doubt enough and stopped early on 4. Another time it found that depending on the interpretation of the question, both 4 and -5 might be correct and it chose 4. Pass β . (2/6)
I asked this question
What is the smallest integer such that its square is larger than 15 and smallest than 35?
To a bunch of models. They ALL* answered 4 instead of the correct answer (-5).
Let me dive into a π§΅:
*Ok, almost all of them. See below. (1/6)
I really really liked this video from @mattcolville.bsky.social
If you are interested in D&D, its history and evolution, and have an hour or so to spare well worth che
youtu.be/wDCQspQDchI?...
How do you block/mute a list?
I donβt get it: for the first problem itβs the only model giving the correct answer. Or am I missing something?
What is the verdict based on?
of course
I like using the standard library when I can, but this is good to know
Python TIL: `prod` in the `math` module exists. Thanks @howard.fm