Genuinely curious how it works, like chatGPT responses are not instant (and I assume it take some time to read the response and try to understand it if the question is complex)
Do people just, sit in silence for like couple of minutes or how it works ?
For the sake of my mental health I will leave social media until Danish election season is finished
Had an experience reading news/election campaigns while being immigrant in sweden during election season and it was not fun (to put in mildly)
How Europeans view average American city :
True, I have actually missed the exact point when labs stoped providing logits because of "distillation anxiety", somewhere in early 2024 ?
Oh yeah, somehow forgot about LLM-as-judge even though it was explicitly mentioned
And I agree that you can not really call it distillation in a sense how most people image it
Also realistically
Chinese labs most likely aren't directly training on Claude's outputs (because not 150k, not even 13 million is enough for something like a frontier-level performance lol)
My prediction - most likely they're using Opus outputs as seeds for synthetic data
Ok now I want to distill Opus just out of spite
Ah, didn't know that
Could it because Z.AI have IPOd and they could take more legal action against accusation ?
Nah they should stop pretending this is some "national security" issue
They have cut access to xAI and OpenAI for their coding tools also. It is pretty obvious that Anthropic is afraid of the competition at this point
techcrunch.com/2025/08/02/a...
And like in computer vision models you often end up just randomly guessing towards the end just to get a better score
Be honest, you like it because it is happening in MalmΓΆ this year ππ
I think your students will definitely appreciate;) tbh I see how it becoming increasingly more important both in research but also industry
Thank you for the link !
Yeah special course might be interesting, also I have still this annoying idea of adding VLM benchmarks to EuroEval
But I am afraid that has to wait at least till Summer because I am totally busy with current studies/research projects π
I must admit, that at this point I would strongly benefit from the university course that focuses just on model evaluation (both CV and LLMs)
Like I am taking many courses on how to train stuff, but imho it is as important to be able to evaluate what you have trained
Ahahaha that's a good one
Sweden announced new "AI strategy"
I think it is quite cool that more and more Nordic countries are interested in training open Language Models
news.cision.com/knut-och-ali...
I actually messed up and download a transparent .png but decided to leave it like that because it looks sick π₯
I think we are slowly moving towards reinventing backpropagation from first principles ππ
I mean I think it won't be a problem considering usually target model in speculative decoding can just reject tokens, the issue that it might hurt the performance speedup
Wow, that actually might work pretty well in case where language/framework is fixed
arxiv.org/abs/2505.21594
Well quick Googling says that we can
I wonder what if doing speculative decoding this way :
Take smaller/distilled/heavily quantized model and run it locally, then generate "proposal" tokens for larger target model that runs on the cloud
Thanks for the reminder
Finally joined IDA students because of this π
When you have stopped being GPU poor and realized there is whole another PyTorch to learn
It would be nice to see "cost" axis, I think this might be actual moat of open weights/open source models
Also I wanted to thank @dorialexander.bsky.social for an inspiration to work on synthetic data, which will be the focus of my work
I am very grateful to @kennethenevoldsen.bsky.social for the support during the hiring process (and of course mentorship outside of it)
Life update:
In March, Iβll be starting as a Student Developer on the Danish Foundation Models project at Aarhus University
I am excited to work with amazing people and contribute to Danish open source π©π°