Rémy (@xowap.dev)

There is what you would call a trust crisis

10.03.2026 14:26 👍 0 🔁 0 💬 1 📌 0

lol I wish _I_ could look in my thought chain, that'd be a progress already

10.03.2026 11:23 👍 1 🔁 0 💬 0 📌 0

That's definitely my thinking. Lots of small agents talking to each other. Even coding should have 5~10 layers of agents, and probably a bunch of different agents for different tasks at each layer IMHO

10.03.2026 11:19 👍 2 🔁 0 💬 1 📌 0

Great now Gmail is telling me to make my sentences concise. They better use this for the output of Gemini's thinking that'd save us all some time

10.03.2026 10:36 👍 0 🔁 0 💬 0 📌 0

Amazon: do you want to receive this in 2 days instead of tomorrow so that we make less trips?

Me: sure, sounds responsible

[2 days later]

Amazon: *does 2 distinct deliveries*

---

ARE YOU KIDDING ME

10.03.2026 10:23 👍 0 🔁 0 💬 0 📌 0

Clockwise, from top left: Jupiter, Saturn, Neptune, and Uranus. None of the worlds is to scale, but all are imaged with JWST's near-infrared (NIRCam) instrument.

The giant planets of the Solar System, by JWST.

09.03.2026 02:46 👍 542 🔁 164 💬 7 📌 12

Yeah and that 9B is better than 35B. I guess with less parameters it gets less confused when remembering pre-canned benchmark answers. The whole thing makes no sense. And actually using the model feels like shit...

08.03.2026 23:20 👍 2 🔁 0 💬 1 📌 0

Gemini 3 Flash had a huge issue (IMHO) with follow-up questions but I see that 3.1 fixed that and I must say it's becoming my favorite model

08.03.2026 15:17 👍 0 🔁 0 💬 0 📌 0

"I don't know what is wrong with the benchmarks"

Obviously I know what is wrong, and that's the fact that models are fine-tuned on the benchmarks and they don't mean anything anymore

08.03.2026 00:56 👍 0 🔁 0 💬 0 📌 0

Honestly, I don't know what is wrong with the benchmarks but I'm calling bullshit on this whole game. Running Qwen 3.5 27B (supposedly the smartest of the family?...) costs 400 times more than Ministral 14B for essentially bad results? That's a complete fucking disconnect from reality 🥴

08.03.2026 00:56 👍 0 🔁 0 💬 1 📌 0

These numbers come from my current app, I've made a bunch of runs of the same stuff with different model to see what the fuck. You can imagine my surprise when I've seen that Opus was almost as fast to complete as Gemini 3.1 Flash

08.03.2026 00:52 👍 1 🔁 0 💬 0 📌 0

Dear LLM Industry,

Please find herein my official letter of hate against reasoning models.

It's fucking ridiculous, despite its prohibitive price it's 15x (literally!) cheaper to run Opus 4.6 than Gemini 3.1 Pro...

IMHO Ministral 14B beats everyone to the pulp on all metrics

08.03.2026 00:52 👍 0 🔁 0 💬 1 📌 1

Currently benching a bunch of models together for a specific task, Qwen is just getting lost in its own thoughts it's a disaster... For answers that aren't even that good 😅

To compare Qwen 3.5 27B with GPT OSS 20B, I much prefer the later (which is also MUCH faster on Ollama)

08.03.2026 00:33 👍 1 🔁 0 💬 1 📌 0

Nice!

08.03.2026 00:25 👍 2 🔁 0 💬 1 📌 0

Is anyone actually using Qwen? They score well in benchmarks but when ti's about doing something useful the outcome is nevery really satisfying (for me)?

08.03.2026 00:17 👍 1 🔁 0 💬 1 📌 0

That's way too much excitement over a n8n flow which suggests a domain name 🤣

07.03.2026 23:25 👍 0 🔁 0 💬 0 📌 0

Which also raises the question: if that is true, does that mean that you can reach an absolute truth on any topic, provided the right thought process?

Can you do the same to humans?

07.03.2026 23:25 👍 0 🔁 0 💬 1 📌 0

If somehow you can get LLMs to work in a way where you systematically eliminate doubt, it means that indeed you could get deterministic outputs despite LLMs being absolutely random at their core

07.03.2026 23:25 👍 0 🔁 0 💬 1 📌 0

Being sure when nothing is sure: that's what Information Theory manages to do. Transform a messy radio soup into a reliable 1 Gbps wifi link.

Now in terms of LLMs... Does that apply?

Just made a n8n-based algorithm which, out of an open questions, converges run after run. That's super promising!

07.03.2026 23:25 👍 1 🔁 0 💬 1 📌 0

Interesting that energy companies can't cut you off without months of procedures despite extremely high marginal costs while subscription services will block you the second your card gets declined for any reason despite near-zero marginal cost

07.03.2026 20:21 👍 1 🔁 0 💬 0 📌 0

can't you say that the model has a thinking mode therefor they am?

07.03.2026 20:08 👍 0 🔁 0 💬 0 📌 0

This doesn't apply if said API was created after 2010

07.03.2026 20:06 👍 0 🔁 0 💬 0 📌 0

When a service has a XML API you don't know if you should boo them because who still uses XML or respect them because they had an API before everyone else had an API

07.03.2026 20:06 👍 2 🔁 0 💬 1 📌 0

3D printing times make no sense. You can print dozens of pieces in 15 minutes but then print one thing in 20h.

It's like the John Wick currency, but reversed

06.03.2026 20:17 👍 0 🔁 0 💬 0 📌 0

She does swear a lot more IRL, though 🤣

06.03.2026 19:39 👍 1 🔁 0 💬 0 📌 0

A good reminder that, on top of not letting your AI do whatever the fuck it wants, you should have safeguards against mistakes and recovery plans that you can't fuck up the same way as the rest

06.03.2026 19:36 👍 1 🔁 0 💬 0 📌 0

Terraform + Infracost + Gemini CLI = ❤️

As Yann LeCun said, if LLMs don't know the consequences of their actions they can't do anything, but in the case fo cloud infrastructure there are tools to simulate that. Days of work done in minutes, I'm happy.

06.03.2026 11:49 👍 0 🔁 0 💬 0 📌 0

Ministral 3 est vraiment sous-côté. GPT-OSS aussi, mais un peu plus gros (mais très très rapide)

05.03.2026 19:41 👍 1 🔁 0 💬 1 📌 0

Y'a généralement pas grand chose à tirer de cette taille de modèle dans un assistant de code TBH

05.03.2026 19:29 👍 0 🔁 0 💬 1 📌 0

I'm saying this in the sense that WordPress was an amazing tool on its own when it was released and allowed countless people (including me) to create a website in a flexible way. But it has now been surpassed by other tools on every single metric (except usage, ofc).

05.03.2026 15:17 👍 1 🔁 0 💬 0 📌 0

Rémy

Latest posts by Rémy @xowap.dev