An experiment in “AI dreaming” Starting with an initial image, 5 sec segments are created prompted by another AI and then stitched together!
Generated entirely with open models (including the code to automate the process)
An experiment in “AI dreaming” Starting with an initial image, 5 sec segments are created prompted by another AI and then stitched together!
Generated entirely with open models (including the code to automate the process)
I got one of these for my son 2-3 years ago and he still frequently carries it with him!
We need to start, at the very least, building real, testable, hypotheses about the behavior of models.
But honestly, most LLM papers are merely stating an *observation* and dressing it up as a hypothesis.
The current messiness around LLM evaluations is ultimately caught up in the limits of working under conditions of pure empericism.
We’ll never dig ourselves entirely out of this hole until theory starts to catch up with practice.
Paper after paper overreaches and attempts impossible general claims
I should have added “necessary but not sufficient”.
But leads to the question “what is the optimal prompt”?
You could jitter that point in latent space until you overfit the task, but I’m not sure that’s super informative either.
Ultimately what we need is deeper theoretical foundations.
We just released a rebuttal to that paper I think you'll enjoy! blog.dottxt.co/say-what-you...
LLM observation of the day: I think that guided/constrained generation gets a bad rap. There was one paper making the rounds about how guided generation harms reasoning ability that everyone took as gospel.
A new paper, "Let Me Speak Freely" has been spreading rumors that structured generation hurts LLM evaluation performance.
Well, we've taken a look and found serious issue in this paper, and shown, once again, that structured generation *improves* evaluation performance!
A graph showing that structured generation performs better than unstructured generation.
Our new blog post is out!
@willkurt.bsky.social provides a rebuttal for a reasonably well known paper which concluded that structured generation with LLMs always resulted in worse performance.
We do not find the same thing.
blog.dottxt.co/say-what-you...
First post! Created this account awhile ago, but things seem to be picking up and it has a very nice "old Twitter" feel to it here!