I feel like ARC is goodhart's law all over again. As soon as people started targetting it, we started beating it.
I feel like ARC is goodhart's law all over again. As soon as people started targetting it, we started beating it.
Very useful tool!
Explain key equations in an intuitive yet rigorous way. More on how results achieved, less on result details. Outro reviews technical details of the method. No small talk. No gushing, or broad implications. Just focused and engaging technical discussion.
Works pretty well!
3/3
Podcast for AI researchers. Experts ask advanced questions. Systematically cover each section. Explain insights, methods and how to implement. Cover the paper in detail, especially the mechanics and components of how the method is implemented.
2/3
I spent a looong time iterating on this prompt for customizing NotebookLM notebooklm.google.com to do high quality summaries of research papers:
1/3
Yes so much of LLM capability still comes down to properly constructed and curated data
So maybe agentic systems need a strong set of subroutines designed to leverage external information sources, apply different reasoning paths - i.e. think more outside the box
I am really curious *why* - like what specifically is it that the humans do with the the really long periods that AI doesn't?
I found this post from @hamel.bsky.social helpful: hamel.dev/blog/posts/l...
Playing with silent CoT prompts today. I don't expect much but you never know....
I'd love to see more people explore this! I'm surprised more people aren't doing something like what you mentioned with using it with standard pretrained LLMs. I could imagine some interesting combinations with sampling strategies @xjdr.bsky.social
Sonnet still really impressive on this "benchmark"