Igor Gilitschenski's Avatar

Igor Gilitschenski

@igilitschenski

Assistant Professor in Computer Science at UofT.

455
Followers
108
Following
106
Posts
25.11.2024
Joined
Posts Following

Latest posts by Igor Gilitschenski @igilitschenski

Every year around this time, I wish for introducing a matching procedure for CS grad school as in medical residency. This would be a mental health improvement for everyone involved. How are doctors better at this than computer scientists? Where is our occupational pride?

05.03.2026 22:07 ๐Ÿ‘ 7 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image Post image Post image

Had a great time speaking at Stanford's Center for Image System Engineering and Vision seminars over the last two weeks. ๐Ÿ‘จโ€๐Ÿซ Thank you for hosting me, Gordon Wetzstein and Wenlong Huang.

19.02.2026 16:26 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Running PPO for continuous control and frustrated by instability or long hyperparameter tuning? ๐Ÿคฏ Give REPPO a try. More integrations covering your favourite RL library are coming.

๐Ÿ“„https://arxiv.org/abs/2507.11019
๐Ÿ’ปhttps://github.com/cvoelcker/reppo
โœ๏ธhttps://cvoelcker.de/blog/2025/reppo-intro/

13.02.2026 19:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Huge kudos to @cvoelcker.bsky.social for bringing this all together and leading a collaboration also involving @axelbrunnbauer.bsky.social, @marcelhussing.bsky.social, Michal Nauman, Pieter Abbeel, Ragu Grosu, @sologen.bsky.social, across UofT, Vector, Poly, Mila, TU Vienna, UPenn, UC Berkeley

13.02.2026 19:28 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Despite training a critic, REPPO is fast. โšก๏ธ

In JAX, it matches PPO's wall-clock time while delivering ~33% higher returns. The sample efficiency of pathwise gradients offsets the extra per-update computation.

13.02.2026 19:28 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Maybe our favorite result: REPPO trains โ€œreliablyโ€. โค๏ธ

Once performance crosses a threshold, it stays there. ~80% of REPPO runs reach high performance without ever dropping back down. PPO? About 40 percentage points fewer.

No more "it was working at 3am but crashed by morning."๐Ÿ˜ฑ

13.02.2026 19:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

The results? REPPO significantly outperforms PPO on DMC and ManiSkill benchmarks in both sample efficiency and final performance.

It even rivals off-policy methods (like FastTD3) while using a fraction of the memory (no massive replay buffers needed). ๐Ÿ“‰๐Ÿ’พ

13.02.2026 19:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Our ablations show that HL-Gauss and KL regularization are the most critical pieces.

13.02.2026 19:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

To improve surrogate value function estimation, we incorporated
โ†’ HL-Gauss categorical loss for Q-learning (scale-invariant gradients!)
โ†’ Layer normalization for stable representations
โ†’ Auxiliary self-prediction tasks for richer features

13.02.2026 19:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Naively using pathwise gradients on-policy leads to collapse. ๐Ÿ“‰Two ingredients keep REPPO stable:

1๏ธโƒฃ Maximum entropy objective โ€” keeps the policy exploring
2๏ธโƒฃ KL-constrained updates (the "Relative Entropy" in REPPO) โ€” prevents the policy from jumping too far from where the Q-function is accurate

13.02.2026 19:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

REPPO's key insight: train an accurate Q-function purely from on-policy data, then use pathwise (reparameterized) gradients to update the policy. ๐Ÿ’ก

We avoid importance sampling corrections โ€” a major source of variance in PPO. ๐Ÿ˜ตโ€๐Ÿ’ซ

13.02.2026 19:28 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

The "Standard" PPO uses score-based gradients. While popular, these estimators are notoriously noisy, leading to training instability.

We asked: Can we use the reparametrization trick (as in DDPG/SAC) but keep the simplicity of on-policy learning?

13.02.2026 19:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐Ÿš€ Excited to share REPPO, a new on-policy RL agent!

TL;DR: Replace PPO with REPPO for fewer hyperparameter headaches and more robust training.

REPPO, led by @cvoelcker.bsky.social, will be presented at ICLR 2026. How does it work? ๐Ÿงต๐Ÿ‘‡

13.02.2026 19:28 ๐Ÿ‘ 25 ๐Ÿ” 10 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Given the diverse work on world models, there is a debate on where to draw the line between world models and other dynamic systems models. I'd argue that, like LLMs for general-purpose language modelling, world models are not designed for a single specific task or phenomenon.

09.02.2026 21:56 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Given how teleop solutions for AVs work, some of the risks outlined in this article seem contrived, even without knowing Waymo's specifics. I assume one reason for having teleop teams in locations such as Philippines (beyond cost) is to account for different time zones as the vehicles operate 24/7.

07.02.2026 01:27 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

2025/2026 may be remembered as the years when many grad students "stopped" and many professors "resumed" programming.

08.01.2026 18:43 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

I'll be at #NeurIPS2025 in San Diego from Thu to Sat, and I am looking for PostDocs in Embodied AI, particularly in world modeling and simulator learning. Please reach out if you are interested.

01.12.2025 17:17 ๐Ÿ‘ 5 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
Prospective Graduate Students โ€” Department of Computer Science, University of Toronto Applications for Fall 2026 are open. Apply by December 1, 2025.

Check out the Department of Computer Science website to apply. The application deadline is December 1. web.cs.toronto.edu/graduate/pro...

22.11.2025 20:50 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

We are affiliated with @vectorinstitute.ai, the Schwartz Reisman Institute, @uoftdsi.bsky.social, the Acceleration Consortium, and the Robotics Institute. There is a vibrant AI community around U of T, and we have a fantastic set of collaborators. 4/n

22.11.2025 20:50 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image

Admittedly, Toronto winters can be cold โ„๏ธ, but we have access to multiple GPU clusters that help us stay warm ๐Ÿ”ฅ. And we have a growing set of robots to keep us busy ๐Ÿฆพ๐Ÿฆฟ๐Ÿš˜. 3/n

22.11.2025 20:50 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

This involves exciting topics such as control techniques for generative models ๐ŸŽฎ, editing techniques for scene representations โœ‚๏ธ, physical dynamics learning โš™๏ธ, planning in learned models ๐Ÿ—บ๏ธ, policy learning ๐Ÿง , and real-world robot deployment ๐Ÿฆพ. 2/n

22.11.2025 20:50 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

I'm looking for graduate students to join my group in fall 2026. We work at the intersection of Computer Vision, Deep Learning, and Robotics.

The goal of our work is to create and understand organic ๐Ÿ€ simulation systems, i.e., controllable data-curation engines built from real-world data. 1/n

22.11.2025 20:50 ๐Ÿ‘ 9 ๐Ÿ” 4 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Together with our brilliant graduate researcher Ziyi Wu and our fantastic undergrad, Brayden Zhang, we had a wonderful time at #ICCV2025!

25.10.2025 02:32 ๐Ÿ‘ 5 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Preview
A small number of samples can poison LLMs of any size Anthropic research on data-poisoning attacks in large language models

Interesting @anthropic.com work on LLM poisoning. I'd argue that the glass is half full, and this can be interpreted as a positive result: Regardless of model size, the same number of samples seems to be sufficient to learn a new "skill". www.anthropic.com/research/sma...

10.10.2025 03:06 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Excited to have been appointed as a faculty member at @vectorinstitute.ai. ๐ŸŽ‰ This would not have been possible without the amazing work of my collaborators and the brilliant students in my group within @uoftcompsci.bsky.social. We all look forward to deepening ties with the Vector community. โค๏ธ

01.10.2025 22:23 ๐Ÿ‘ 11 ๐Ÿ” 0 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0
University of Toronto, Department of Computer Science Job #AJO30410, Assistant Professor, Teaching Stream - Computer Science, Department of Computer Science, University of Toronto, Toronto, Ontario, CA

Want to help train the next generation of leaders, researchers, and innovators in CS? We are looking for teaching stream faculty to join us at @uoftcompsci.bsky.social. Check out our job posting and join a team of brilliant colleagues and amazing students. ๐Ÿ‡จ๐Ÿ‡ฆโค๏ธ๐Ÿ’ป academicjobsonline.org/ajo/jobs/30410

01.10.2025 00:19 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Proud German advisor moment: Discovered a pair of Birkenstocks under a grad student's desk.

04.09.2025 14:32 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Would you be surprised to learn that many empirical implementations of value-aware model learning (VAML) algos, including MuZero, lead to incorrect model & value functions when training stochastic models ๐Ÿค•? In our new @icmlconf.bsky.social 2025 paper, we show why this happens and how to fix it ๐Ÿฆพ!

19.06.2025 02:39 ๐Ÿ‘ 8 ๐Ÿ” 3 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 1

Agree with all of the above. There is also a version of this among researchers, i.e., people knowingly defending the continued existence of likely unfruitful research directions to protect their own careers. Ultimately, each insight needs to be evaluated on its own merits.

15.06.2025 23:21 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Thrilled to share the work that our lab will present at
@cvprconference.bsky.social. Check out the papers and meet @kai-he.bsky.social, @yashkant.bsky.social, @dazitu616.bsky.social, and Toshiya Yura in Nashville at their poster sessions!

10.06.2025 20:04 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0