Every year around this time, I wish for introducing a matching procedure for CS grad school as in medical residency. This would be a mental health improvement for everyone involved. How are doctors better at this than computer scientists? Where is our occupational pride?
05.03.2026 22:07
๐ 7
๐ 1
๐ฌ 0
๐ 0
Had a great time speaking at Stanford's Center for Image System Engineering and Vision seminars over the last two weeks. ๐จโ๐ซ Thank you for hosting me, Gordon Wetzstein and Wenlong Huang.
19.02.2026 16:26
๐ 3
๐ 0
๐ฌ 0
๐ 0
Running PPO for continuous control and frustrated by instability or long hyperparameter tuning? ๐คฏ Give REPPO a try. More integrations covering your favourite RL library are coming.
๐https://arxiv.org/abs/2507.11019
๐ปhttps://github.com/cvoelcker/reppo
โ๏ธhttps://cvoelcker.de/blog/2025/reppo-intro/
13.02.2026 19:28
๐ 2
๐ 0
๐ฌ 0
๐ 0
Huge kudos to @cvoelcker.bsky.social for bringing this all together and leading a collaboration also involving @axelbrunnbauer.bsky.social, @marcelhussing.bsky.social, Michal Nauman, Pieter Abbeel, Ragu Grosu, @sologen.bsky.social, across UofT, Vector, Poly, Mila, TU Vienna, UPenn, UC Berkeley
13.02.2026 19:28
๐ 3
๐ 0
๐ฌ 1
๐ 0
Despite training a critic, REPPO is fast. โก๏ธ
In JAX, it matches PPO's wall-clock time while delivering ~33% higher returns. The sample efficiency of pathwise gradients offsets the extra per-update computation.
13.02.2026 19:28
๐ 3
๐ 0
๐ฌ 1
๐ 0
Maybe our favorite result: REPPO trains โreliablyโ. โค๏ธ
Once performance crosses a threshold, it stays there. ~80% of REPPO runs reach high performance without ever dropping back down. PPO? About 40 percentage points fewer.
No more "it was working at 3am but crashed by morning."๐ฑ
13.02.2026 19:28
๐ 2
๐ 0
๐ฌ 1
๐ 0
The results? REPPO significantly outperforms PPO on DMC and ManiSkill benchmarks in both sample efficiency and final performance.
It even rivals off-policy methods (like FastTD3) while using a fraction of the memory (no massive replay buffers needed). ๐๐พ
13.02.2026 19:28
๐ 2
๐ 0
๐ฌ 1
๐ 0
Our ablations show that HL-Gauss and KL regularization are the most critical pieces.
13.02.2026 19:28
๐ 2
๐ 0
๐ฌ 1
๐ 0
To improve surrogate value function estimation, we incorporated
โ HL-Gauss categorical loss for Q-learning (scale-invariant gradients!)
โ Layer normalization for stable representations
โ Auxiliary self-prediction tasks for richer features
13.02.2026 19:28
๐ 2
๐ 0
๐ฌ 1
๐ 0
Naively using pathwise gradients on-policy leads to collapse. ๐Two ingredients keep REPPO stable:
1๏ธโฃ Maximum entropy objective โ keeps the policy exploring
2๏ธโฃ KL-constrained updates (the "Relative Entropy" in REPPO) โ prevents the policy from jumping too far from where the Q-function is accurate
13.02.2026 19:28
๐ 2
๐ 0
๐ฌ 1
๐ 0
REPPO's key insight: train an accurate Q-function purely from on-policy data, then use pathwise (reparameterized) gradients to update the policy. ๐ก
We avoid importance sampling corrections โ a major source of variance in PPO. ๐ตโ๐ซ
13.02.2026 19:28
๐ 3
๐ 0
๐ฌ 1
๐ 0
The "Standard" PPO uses score-based gradients. While popular, these estimators are notoriously noisy, leading to training instability.
We asked: Can we use the reparametrization trick (as in DDPG/SAC) but keep the simplicity of on-policy learning?
13.02.2026 19:28
๐ 2
๐ 0
๐ฌ 1
๐ 0
๐ Excited to share REPPO, a new on-policy RL agent!
TL;DR: Replace PPO with REPPO for fewer hyperparameter headaches and more robust training.
REPPO, led by @cvoelcker.bsky.social, will be presented at ICLR 2026. How does it work? ๐งต๐
13.02.2026 19:28
๐ 25
๐ 10
๐ฌ 1
๐ 0
Given the diverse work on world models, there is a debate on where to draw the line between world models and other dynamic systems models. I'd argue that, like LLMs for general-purpose language modelling, world models are not designed for a single specific task or phenomenon.
09.02.2026 21:56
๐ 2
๐ 0
๐ฌ 0
๐ 0
Given how teleop solutions for AVs work, some of the risks outlined in this article seem contrived, even without knowing Waymo's specifics. I assume one reason for having teleop teams in locations such as Philippines (beyond cost) is to account for different time zones as the vehicles operate 24/7.
07.02.2026 01:27
๐ 0
๐ 0
๐ฌ 0
๐ 0
2025/2026 may be remembered as the years when many grad students "stopped" and many professors "resumed" programming.
08.01.2026 18:43
๐ 1
๐ 0
๐ฌ 0
๐ 0
I'll be at #NeurIPS2025 in San Diego from Thu to Sat, and I am looking for PostDocs in Embodied AI, particularly in world modeling and simulator learning. Please reach out if you are interested.
01.12.2025 17:17
๐ 5
๐ 1
๐ฌ 0
๐ 0
We are affiliated with @vectorinstitute.ai, the Schwartz Reisman Institute, @uoftdsi.bsky.social, the Acceleration Consortium, and the Robotics Institute. There is a vibrant AI community around U of T, and we have a fantastic set of collaborators. 4/n
22.11.2025 20:50
๐ 0
๐ 0
๐ฌ 1
๐ 0
Admittedly, Toronto winters can be cold โ๏ธ, but we have access to multiple GPU clusters that help us stay warm ๐ฅ. And we have a growing set of robots to keep us busy ๐ฆพ๐ฆฟ๐. 3/n
22.11.2025 20:50
๐ 0
๐ 0
๐ฌ 1
๐ 0
This involves exciting topics such as control techniques for generative models ๐ฎ, editing techniques for scene representations โ๏ธ, physical dynamics learning โ๏ธ, planning in learned models ๐บ๏ธ, policy learning ๐ง , and real-world robot deployment ๐ฆพ. 2/n
22.11.2025 20:50
๐ 1
๐ 0
๐ฌ 1
๐ 0
I'm looking for graduate students to join my group in fall 2026. We work at the intersection of Computer Vision, Deep Learning, and Robotics.
The goal of our work is to create and understand organic ๐ simulation systems, i.e., controllable data-curation engines built from real-world data. 1/n
22.11.2025 20:50
๐ 9
๐ 4
๐ฌ 1
๐ 0
Together with our brilliant graduate researcher Ziyi Wu and our fantastic undergrad, Brayden Zhang, we had a wonderful time at #ICCV2025!
25.10.2025 02:32
๐ 5
๐ 0
๐ฌ 0
๐ 0
A small number of samples can poison LLMs of any size
Anthropic research on data-poisoning attacks in large language models
Interesting @anthropic.com work on LLM poisoning. I'd argue that the glass is half full, and this can be interpreted as a positive result: Regardless of model size, the same number of samples seems to be sufficient to learn a new "skill". www.anthropic.com/research/sma...
10.10.2025 03:06
๐ 1
๐ 0
๐ฌ 0
๐ 0
Excited to have been appointed as a faculty member at @vectorinstitute.ai. ๐ This would not have been possible without the amazing work of my collaborators and the brilliant students in my group within @uoftcompsci.bsky.social. We all look forward to deepening ties with the Vector community. โค๏ธ
01.10.2025 22:23
๐ 11
๐ 0
๐ฌ 2
๐ 0
University of Toronto, Department of Computer Science
Job #AJO30410, Assistant Professor, Teaching Stream - Computer Science, Department of Computer Science, University of Toronto, Toronto, Ontario, CA
Want to help train the next generation of leaders, researchers, and innovators in CS? We are looking for teaching stream faculty to join us at @uoftcompsci.bsky.social. Check out our job posting and join a team of brilliant colleagues and amazing students. ๐จ๐ฆโค๏ธ๐ป academicjobsonline.org/ajo/jobs/30410
01.10.2025 00:19
๐ 2
๐ 0
๐ฌ 0
๐ 0
Proud German advisor moment: Discovered a pair of Birkenstocks under a grad student's desk.
04.09.2025 14:32
๐ 2
๐ 0
๐ฌ 0
๐ 0
Would you be surprised to learn that many empirical implementations of value-aware model learning (VAML) algos, including MuZero, lead to incorrect model & value functions when training stochastic models ๐ค? In our new @icmlconf.bsky.social 2025 paper, we show why this happens and how to fix it ๐ฆพ!
19.06.2025 02:39
๐ 8
๐ 3
๐ฌ 1
๐ 1
Agree with all of the above. There is also a version of this among researchers, i.e., people knowingly defending the continued existence of likely unfruitful research directions to protect their own careers. Ultimately, each insight needs to be evaluated on its own merits.
15.06.2025 23:21
๐ 4
๐ 0
๐ฌ 1
๐ 0
Thrilled to share the work that our lab will present at
@cvprconference.bsky.social. Check out the papers and meet @kai-he.bsky.social, @yashkant.bsky.social, @dazitu616.bsky.social, and Toshiya Yura in Nashville at their poster sessions!
10.06.2025 20:04
๐ 4
๐ 0
๐ฌ 0
๐ 0