Sylvain Combettes (@sylvaincom)

Imbalanced classification: pitfalls and solutions — Probabilistic calibration of cost-sensitive learning

Today at #EuroScipy2025, @glemaitre58.bsky.social and I presented a tutorial on pitfalls of machine learning for imbalanced classification problems.

We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.

probabl-ai.github.io/calibration-...

19.08.2025 11:58 👍 22 🔁 10 💬 1 📌 0

✨️💥skrub: machine learning with dataframes

New release 💫 0.6
A huge one, with the super powerful new "DataOps", and many improvements all over the library.
Exciting!!

24.07.2025 16:16 👍 16 🔁 4 💬 0 📌 0

Python software engineer for tslearn Offre d'emploi Inria

Come work with us on tslearn in beautiful Rennes!

(deadline for application is soon!)

jobs.inria.fr/public/class...

20.02.2025 09:59 👍 4 🔁 5 💬 1 📌 0

Open source software: how to live long and go far An opinionated guide to building open-source software tools with a focus on Python and science A talk that I gave when I was stepping down as a lead…

Just put on line a talk I gave summarizing what I have learned across the years as a maintainer of open source.

It's _opinions_ (been there, done that), but I'm willing to defend them, having stewarded my share of successful open source projects.
speakerdeck.com/gaelvaroquau...

06.02.2025 20:31 👍 53 🔁 12 💬 3 📌 0

Our first flagship feature is the `EstimatorReport`. You feed it your scikit-learn compatible estimator and your dataset, and it displays a helper with metrics and plots to help you investigate your estimator. Computed for you in one-line of code. Blazing fast thanks to caching. Check out our docs!

23.01.2025 15:49 👍 1 🔁 0 💬 0 📌 0

scikit-learn Version 1.6.0 Release Highlights YouTube video by scikit-learn

❄️ The Christmas release is here! ❄️

Introducing scikit-learn 1.6 with:

🟢 2 major features & 34 improvements
🔵 5 efficiency boosts & 21 enhancements
🟡 14 API changes
🔴 30 fixes
👥 160 amazing contributors

youtu.be/7wiHChpwJe8

20.12.2024 09:44 👍 63 🔁 21 💬 1 📌 1

Gaël Varoquaux, vedette de l’intelligence artificielle et défenseur du logiciel libre L’informaticien et chercheur à l’Inria est l’expert français le plus cité dans les publications scientifiques portant sur l’IA. Avec Scikit-learn, un programme de machine learning dont il est le cocré...

Merci @lemonde.fr pour un joli résumé de mes aventures scientifiques et logiciels 📈📠
www.lemonde.fr/sciences/art...

Beaucoup de messages qui me tiennent à cœur : travail d'équipe, logiciel libre, rigueur scientifique

Merci aux collègues et amis qui ont témoigné, je suis ému de lire

15.12.2024 05:35 👍 123 🔁 25 💬 7 📌 2

This year, there are 16 positions at CNRS in computer science (8 in "applied" domains → ask me - 8 on "fundamental" domains → ask the other David).

@mathurinmassias.bsky.social has a good list of advice mathurinm.github.io/cnrs_inria_a...

Official 🔗 www.ins2i.cnrs.fr/en/cnrsinfo/...

Don't wait!

23.11.2024 19:33 👍 32 🔁 18 💬 2 📌 1

Sometimes you think you are right by doing everything "by the book." But sometimes the book is just a tiny part of the full story. Keep digging and writing a new chapter with more insights is actually fun...

05.12.2024 10:15 👍 1 🔁 1 💬 0 📌 0

🎉⚡️Release 0.4:
◼ Easily use deep learning for text entries
◼ TableVectorizer can remove columns with too many missing values
◼ TableReport more robust and prettier
...

1/5

27.11.2024 20:46 👍 11 🔁 4 💬 1 📌 0

A high-level summary diagram taken from the slides linked below. It shows the interplay of two main components: a probabilistic model and decision maker or planner.

Probabilistic predictions of an underfitting polynomial classifier on a noisy XOR task and the corresponding under-confident calibration curve.

Probabilistic predictions of an overfitting polynomial classifier and the resulting overconfident calibration curve on the same noisy XOR problem.

Simulation study to show the relative lack of stability of hyperparameter tuning when using hard metrics such as Accuracy or soft yet not probabilistic metrics such as ROC AUC compared to a strictly proper scoring rule such as the log-loss.

I recently shared some of my reflections on how to use probabilistic classifiers for optimal decision-making under uncertainty at @pydataparis.bsky.social 2024.

Here is the recording of the presentation:

www.youtube.com/watch?v=-gYn...

27.11.2024 14:17 👍 49 🔁 19 💬 1 📌 1

Sylvain Combettes

Latest posts by Sylvain Combettes @sylvaincom