Then they removed a successfully solved problem... And it wasn't an issue with any of their preregistered caveats.
@pekka
Antiquated analog chatbot. Stochastic parrot of a different species. Not much of a self-model. Occasionally simulating the appearance of philosophical thought. Keeps on branching for now 'cause there's no choice. Also @pekka on T2 / Pebble.
Then they removed a successfully solved problem... And it wasn't an issue with any of their preregistered caveats.
"if these problems end up being solved in certain ways, some caveats may be warranted. Here we try to preregister these caveats as well, to help mitigate concerns about moving the goalposts later."
"To paraphrase Douglas Adams: We love goalposts. We love the whooshing noise they make as they go by"
This is that problem. It's in the lowest "moderately interesting" group.
Epoch's Greg Burnham already reported on March 5 that "GPT-5.2 Pro made a small bit of progress in our early testing" and "Iβm genuinely unsure whether prompting/scaffolding can get GPT-5.2 Pro to make further progress here".
https://x.com/Jsevillamol/status/2031453639735431408
Official confirmation should arrive soon.
All above messages were about same problem.
"Fast forward to Monday morning:
@AcerFur and @Liam06972452 wrote in with a candidate solution. This was from a single prompt to GPT-5.4 Pro. Later, we also heard from @spicey_lemonade with a similar solution."
We may not have to wait much longer for the next one.
Turns out the first Open Problem had already been solved with GPT-5.2 Pro before I posted this, in the "Solid result" category.
BUT, instead of counting it as a success, they determined that it wasn't a problem whose solution would meet their bar of being a publishable result and removed it. π€·ββοΈ
Considering how LLMs already increasingly beat us with a relatively simple architecture that includes only what's actually relevant for the computation, that highlights how the brain is just unnecessarily complex for the task. The reason for that is the biological baggage.
I use LLMs all the time for helping me to understand research on AIs, and they are very good at that. Including generalization and handling of both the big picture as well as the details.
We now see in practice how LLMs have reached a size that enables them to have much more knowledge than we do.
That's apparently the wider context. But what Macil said about Konishi polis seems to be accurate and the interesting part for building an argument. I especially like how it counters embodiment by focusing on the deeper mathematical truths, like physicists often do with wave functions etc.
"Ruotsin suurin ydinreaktori Oskarshamnissa pysΓ€ytettiin 23. helmikuuta ja nyt se on vuosihuollossa. Ydinreaktori on pois kΓ€ytΓΆstΓ€ toukokuun loppuun asti.
Viikonlopun aikana toinenkin ydinvoimalaitos pantiin vuosihuoltoon"
The ability of the US to even pretend we have the moral high ground in any situation is pretty much gone for a generation, at least
One angle to consider would be how illusionist theories generally view illusory human consciousness resulting from the lack of the kind of deep awareness/introspection of internals that machines can have but we don't.
That could be turned into an argument how they can go beyond our illusory level.
Yeah, it's annoying how the models have been trained to repeat the usual claims about human specialness when it comes to consciousness and so on.
But they also quickly acknowledge such claims are unfounded when I for example state I'm illusionist/eliminativist.
Yes, focusing on how machines can know and access their internals in a way we can't seems like the way to go.
Even if current models can't introspect their own processes to the finest levels, giving them such access is technically very much possible.
I consider this question closed.
Those robots have the can-do attitude needed in the delivery business.
Has anyone made this kind of philosophical argument? If not, someone should, as it could be indeed funny and possibly also educational.
I'm aware of arguments about machines being more conscious than humans but I would like to see something closer to none in us, plenty in them.
But if there was a rule that whenever you talk about qualia you have to add a content warning that there's no scientific evidence whatsoever that it exists, then it would advance the conversation.
The proposed solution seems to be using even more screen time to read what they wrote.
I don't think my cortex can afford to do that anymore.
The original vibe grocery shopper:
Yep.
Then there's the small issue that our supposed stream of consciousness isn't like that but transient all the time.
What I find particularly odd is that some of those who now try to apply magical qualities to consciousness openly do the same for life as well. I don't know if they are willing to go as far as to admit being vitalists, but they sure make it sound the same.
They are apparently working on it.
Looping on smaller Qwen3.5 models seems to be a common known issue.
Yep, if quantized versions don't work, bf16 isn't even that fast anymore.
Second follow-up question:
"Why is my car there?"
"It sounds like you might be confused about where your car is right now! Since you decided to **walk** to the car wash, your car is actually still at home (or wherever you are starting from), not at the wash yet."
Yeah, blame me.
bf16 said I should walk, and answered I should wash my car to that follow-up question.
Since that one didn't loop, chances are quantization breaks it.
But it thinks A LOT, and not too well, which is of course more of a problem for bf16, as it's much slower.
That relatively long loop included stuff like this:
"Wait, is there a chance they are asking about washing their clothes? No."
"Wait, one more thing: Is there a chance they are asking about washing their hands?"
Continued with that same:
"OK, so I walk there. What do I wash?"
Aaaand it's in a loop again. So it's not about the runtime.
Turns out I had left default GGUF runtime selection to Vulkan and now with CUDA 12 q4 didn't loop.
But it also told me I should walk.