π
π
Watching the live stream of a long print is dangerously addicting.
(We are building an SO-100 robot w/ @hf.co in matte purple and green) π¦Ύ
Sounds and seems to behave very similar to Moshi from Kyutai, another duplex audio model that came out last year
Wow, @warp.dev's AI functionality has really improved lately. Now it actually proactively suggests solutions to errors and can even do a full agentic analysis + file update directly on the terminal. Feels good!
This video by @emergentgarden.bsky.social is the most poetic and visceral narrative on current and future AI models used for good vs. bad that I've heard, read, or seen.
All the while showing AI models building stuff in Minecraft.
Well worth a watch!
www.youtube.com/watch?v=FCnQ...
Just wait until you have the choice between 4o and o4
It's live on the App Store today.
apps.apple.com/us/app/tinyc...
I used Cursor and Lovable to write the code and for the most part because these were idle games, React worked well, with what ended up being a βcomputed valueβ engine for complex relationships between numbers (with upgrades etc)
Damn thatβs rough. Pray Operator doesnβt remember.
I did see one team successfully fine tuning GPT-4o with images + coordinates for significant improvements in performance, which would be an interesting experiment to perform as well!
See case studies:
openai.com/index/introd...
Iβve also tried strategies like scaffolding which may help but it seems models have been pretrained with very specific coordinate systems and you must use them for it to work. E.g with Gemini 1.5 you must use ymin, xmin, ymax, xmax which must be ranged 0-1000.
Very interesting, thank you for sharing! I was surprised that Claude with computer use (10/22) was unable to use the picture better since it is able to click screens. Did you try other resolutions / coordinate formats? Iβd also be curious to see how Gemini 2.0 does (even 1.5 is good at coords!)
We talked about this back when we implemented AI for @framer.com almost two years ago - a model that canβt reason about what itβs doing visually canβt make good looking websites. Hopefully this will change with this new ability in models.
Models are gaining the ability to not just see but also modify images on a near pixel by pixel level. This is not just great for creative purposes but for reasoning over visual concepts.
arxiv.org/pdf/2501.07542
When you stumble upon a street
Itβs 2025 β seems odd that the knowledge cutoff isnβt changing
I got a notification that a question I'd asked on Stack Overflow a while ago was getting a lot of traffic recently, and wondered why, and the only hypothesis I have is that there are a lot of people out there that want to run AI-written Python code on their servers.
I've had a recent fascination with idle games, and realized AI code tools are actually excellent assistants for producing this type of game. Quickly whipped up two basic games with actually funny lore in just a few hours!
(100% of code written by AI, though sometimes with a lot of guidance)
The diffusion weather station showing what is and what might be
Slowly realizing that I don't need to check Hacker News or Twitter for the best tech news, @simonwillison.net has written an analysis with great insights every time.
Hello World!