Louis Ohl
banner
louisohl.bsky.social
Louis Ohl
@louisohl.bsky.social
Postdoc @ Linköping University, STIMA division

oshillou.github.io
It is intended for a broad audience from the beginning, and ends with an overview of some of the current deep clustering models. It also features multiple code snippets to get started, even a package!
If you want a historical perspective on discriminative clustering, I hope you'll enjoy reading it.
September 11, 2025 at 11:44 AM
This paper explores multiple aspects of discriminative clustering: its global framework, the evolution of the genre from the 90s to today, and how it is deeply intertwined with mutual information.
September 11, 2025 at 11:44 AM
In addition to that historical journey, we provide examples of such milestones and snippets of code to reproduce them on the fly
May 12, 2025 at 8:19 AM
So how to deal with that? Our tutoria covers the history of genre from the early 90s to modern deep clustering. We show how mutua informztion played a crucial role in its development and present historical milestones we deem relevant.
May 12, 2025 at 8:19 AM
However, learning such a model ks tricky, because common statistical tools do not apply when we assume nothing about the data distribution
May 12, 2025 at 8:19 AM
When doing unsupervised learning, we have two different ways to build our model. One is discriminative: we assume nothing of the data distribution, and try to infer clusters straight out of it. Implicit hyptheses are built within the model
May 12, 2025 at 8:19 AM
This tutorial is intended for both curious readers who know nothing of the genre and a more aware audience.

We hope this tutorial will provide a comprehensive overview, and help develop future research directions for clustering.

So what is it about?
May 12, 2025 at 8:19 AM
In summary:

DISCOTEC is an easy method to implement that show good ranking performance, and is essentially compatible with all clustering models. It does not require any hyperparameter. (5/5)
May 9, 2025 at 6:40 AM
Since DISCOTEC relies on ensemble, its performance is tied to the number of models used for computing the consensus. This is even stronger for the binarised variant. (4/5)
May 9, 2025 at 6:40 AM
An interesting advantage is that binarising the consensus matrix drastically improves the ranking of the clustering algorithms. (3/5)
May 9, 2025 at 6:40 AM
We introduce the DISCOTEC score.

It simply consists in two steps: (i) compute the consensus matrix for a set of clustering algorithms (ii) compute the average distance between connectivities and consensus matrices

Bonus: must link and cannot link constraints are gracefully supported (2/5)
May 9, 2025 at 6:40 AM