Awesome to see @nvidia @NVIDIAAI using our research for their open-source long-context models.
Awesome to see @nvidia @NVIDIAAI using our research for their open-source long-context models.
Talking about DeepSeek and their connection to Tsinghua. Tsinghua and CMU have an older (2017) but still great series on high-performance parallel computing. The playlist can be found here: https://buff.ly/4gktKSd
Happy to anounce that our paper, Bridging the Data Provenance Gap Across Text, Speech, and Video, was accepted to @iclr_conf. #ICLR2025
I have an extremely easy evaluation that currently all top models achieve a 0% on. This is the easiest set of evaluations in our entire suite. AGI would be able to solve the hardest problems effortlessly. Once o3 becomes available in the API, I will put out a public baseline.
Does anyone have any ideas why T5 or CLIP is being used for text encoding in diffusion training instead of a much stronger encoder or embedding model?
There is absolutely no shortage of pre-training data.
AWS ParallelCluster is honestly such an incredibly useful tool for large-scale distributed training:
Paper: compvis.github.io/cleandift/st...
Be sure to check out this awesome work by @stefanabaumann.bsky.social, @rmsnorm.bsky.social, and @koljabauer.bsky.social.