publications | Marc Lafon

2025

NeurIPS
CLIPTTA: Robust Contrastive Vision-Language Test-Time Adaptation

Marc Lafon, Gustavo Adolfo Vargas Hakim, Clément Rambour, and 2 more authors

Conference on Neural Information Processing Systems (NeurIPS 2025), 2025

Abs Bib PDF

Vision-language models (VLMs) like CLIP exhibit strong zero-shot capabilities but often fail to generalize under distribution shifts. Test-time adaptation (TTA) allows models to update at inference time without labeled data, typically via entropy minimization. However, this objective is fundamentally misaligned with the contrastive image-text training of VLMs, limiting adaptation performance and introducing failure modes such as pseudo-label drift and class collapse. We propose CLIPTTA, a new gradient-based TTA method for vision-language models that leverages a soft contrastive loss aligned with CLIP’s pre-training objective. We provide a theoretical analysis of CLIPTTA ’s gradients, showing how its batch-aware design mitigates the risk of collapse. We further extend CLIPTTA to the open-set setting, where both in-distribution (ID) and out-of-distribution (OOD) samples are encountered, using an Outlier Contrastive Exposure (OCE) loss to improve OOD detection. Evaluated on 75 datasets spanning diverse distribution shifts, CLIPTTA consistently outperforms entropy-based objectives and is highly competitive with state-of-the-art TTA methods, outperforming them on a large number of datasets and exhibiting more stable performance across diverse shifts.
@article{Lafon2025, author = {Lafon, Marc and Vargas Hakim, Gustavo Adolfo and Rambour, Clément and Desrosier, Christian and Thome, Nicolas}, journal = {Conference on Neural Information Processing Systems (NeurIPS 2025)}, title = {{CLIPTTA: Robust Contrastive Vision-Language Test-Time Adaptation}}, year = {2025}, }
ICCV
ViLU: Learning Vision-Language Uncertainties for Failure Prediction

Marc Lafon, Yannis Karmim, Julio Silva-Rodríguez, and 6 more authors

International Conference on Computer Vision (ICCV 2025), 2025

Abs Bib PDF

Reliable Uncertainty Quantification (UQ) and failure prediction remain open challenges for Vision-Language Models (VLMs). We introduce ViLU, a new Vision-Language Uncertainty quantification framework that contextualizes uncertainty estimates by leveraging all task-relevant textual representations. ViLU constructs an uncertainty-aware multi-modal representation by integrating the visual embedding, the predicted textual embedding, and an image-conditioned textual representation via cross-attention. Unlike traditional UQ methods based on loss prediction, ViLU trains an uncertainty predictor as a binary classifier to distinguish correct from incorrect predictions using a weighted binary cross-entropy loss, making it loss-agnostic. In particular, our proposed approach is well-suited for post-hoc settings, where only vision and text embeddings are available without direct access to the model itself. Extensive experiments on diverse datasets show the significant gains of our method compared to state-of-the-art failure prediction methods. We apply our method to standard classification datasets, such as ImageNet-1k, as well as large-scale image-caption datasets like CC12M and LAION-400M. Ablation studies highlight the critical role of our architecture and training in achieving effective uncertainty quantification.
@article{LafonKarmim2025, author = {Lafon, Marc and Karmim, Yannis and Silva-Rodríguez, Julio and Couairon, Paul and Rambour, Clément and Fournier S’niehotta, Raphaël and Ben Ayed, Ismail and Dolz, Jose and Thome, Nicolas}, journal = {International Conference on Computer Vision (ICCV 2025)}, title = {{ViLU: Learning Vision-Language Uncertainties for Failure Prediction}}, year = {2025}, }

2024

NeurIPS
Supra-Laplacian Encoding for Transformer on Dynamic Graphs

Yannis Karmim, Marc Lafon, Raphaël Fournier S’niehotta, and 1 more author

Conference on Neural Information Processing Systems (NeurIPS 2024), 2024

Abs Bib PDF

Fully connected Graph Transformers (GT) have rapidly become prominent in the static graph community as an alternative to Message-Passing models, which suffer from a lack of expressivity, oversquashing, and under-reaching. However, in a dynamic context, by interconnecting all nodes at multiple snapshots with self-attention,GT loose both structural and temporal information. In this work, we introduce Supra-LAplacian encoding for spatio-temporal TransformErs (SLATE), a new spatio-temporal encoding to leverage the GT architecture while keeping spatio-temporal information. Specifically, we transform Discrete Time Dynamic Graphs into multi-layer graphs and take advantage of the spectral properties of their associated supra-Laplacian matrix. Our second contribution explicitly model nodes’ pairwise relationships with a cross-attention mechanism, providing an accurate edge representation for dynamic link prediction. SLATE outperforms numerous state-ofthe-art methods based on Message-Passing Graph Neural Networks combined with recurrent models (e.g. , LSTM), and Dynamic Graph Transformers, on 9 datasets. Code and instructions to reproduce our results will be open-sourced.
@article{Karmim2024, author = {Karmim, Yannis and Lafon, Marc and Fournier S’niehotta, Raphaël and Thome, Nicolas}, journal = {Conference on Neural Information Processing Systems (NeurIPS 2024)}, title = {{Supra-Laplacian Encoding for Transformer on Dynamic Graphs}}, year = {2024}, }
ECCV
GalLoP: Learning Global and Local Prompts for Vision-Language Models

Marc Lafon, Elias Ramzi, Clément Rambour, and 2 more authors

Proceedings of the 18th European Conference on Computer Vision, Milan, Italy, 2024, 2024

Abs Bib PDF

Prompt learning has been widely adopted to efficiently adapt vision-language models (VLMs), e.g. CLIP, for few-shot image classification. Despite their success, most prompt learning methods trade-off between classification accuracy and robustness, e.g. in domain generalization or outof-distribution (OOD) detection. In this work, we introduce Global-Local Prompts (GalLoP), a new prompt learning method that learns multiple diverse prompts leveraging both global and local visual features. The training of the local prompts relies on local features with an enhanced vision-text alignment. To focus only on pertinent features, this local alignment is coupled with a sparsity strategy in the selection of the local features. We enforce diversity on the set of prompts using a new “prompt dropout” technique and a multiscale strategy on the local prompts. GalLoP outperforms previous prompt learning methods on accuracy on eleven datasets in different few shots settings and with various backbones. Furthermore, GalLoP shows strong robustness performances in both domain generalization and OOD detection, even outperforming dedicated OOD detection methods. Code and instructions to reproduce our results will be open-sourced.
@article{LafonRamzi2024, author = {Lafon, Marc and Ramzi, Elias and Rambour, Clément and Audebert, Nicolas and Thome, Nicolas}, journal = {Proceedings of the 18th European Conference on Computer Vision, Milan, Italy, 2024}, title = {{GalLoP: Learning Global and Local Prompts for Vision-Language Models}}, year = {2024}, }

2023

ICML
Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection

Marc Lafon, Elias Ramzi, Clément Rambour, and 1 more author

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023, 2023

Abs Bib PDF

Out-of-distribution (OOD) detection is a critical requirement for the deployment of deep neural networks. This paper introduces the HEAT model, a new post-hoc OOD detection method estimating the density of in-distribution (ID) samples using hybrid energy-based models (EBM) in the feature space of a pre-trained backbone. HEAT complements prior density estimators of the ID density, e.g. parametric models like the Gaussian Mixture Model (GMM), to provide an accurate yet robust density estimation. A second contribution is to leverage the EBM framework to provide a unified density estimation and to compose several energy terms. Extensive experiments demonstrate the significance of the two contributions. HEAT sets new state-of-the-art OOD detection results on the CIFAR-10 / CIFAR100 benchmark as well as on the large-scale Imagenet benchmark. The code is available at: github.com/MarcLafon/heatood.
@article{Lafon2023, author = {Lafon, Marc and Ramzi, Elias and Rambour, Clément and Thome, Nicolas}, journal = {Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023}, title = {{Hybrid Energy Based Model in the Feature Space for Out-of-Distribution Detection}}, year = {2023}, }

2022

NeurIPS
Energy Correction Model in the Feature Space for Out-of-Distribution Detection

Marc Lafon, Rambour Clément, and Nicolas Thome

NeurIPS Workshop on Machine Learning Safety, 2022

Abs Bib PDF

In this work, we study the out-of-distribution (OOD) detection problem through the use of the feature space of a pre-trained deep classifier. We show that learning the density of in-distribution (ID) features with an energy-based models (EBM) leads to competitive detection results. However, we found that the non-mixing of MCMC sampling during the EBM’s training undermines its detection performance. To overcome this an energy-based correction of a mixture of class-conditional Gaussian distributions. We obtains favorable results when compared to a strong baseline like the KNN detector on the CIFAR-10/CIFAR-100 OOD detection benchmarks.
@article{Lafon2022, author = {Lafon, Marc and Clément, Rambour and Thome, Nicolas}, journal = {NeurIPS Workshop on Machine Learning Safety}, title = {{Energy Correction Model in the Feature Space for Out-of-Distribution Detection}}, year = {2022}, }

2021

ICML
Beyond First-Order Uncertainty Estimation with Evidential Models for Open-World Recognition

Charles Corbière, Marc Lafon, Nicolas Thome, and 2 more authors

ICML Workshop on Uncertainty and Robustness in Deep Learning, 2021

Abs Bib PDF

In this paper, we tackle the challenge of jointly quantifying in-distribution and out-of-distribution (OOD) uncertainties. We introduce KLoS, a KL-divergence measure defined on the class-probability simplex. By leveraging the second-order uncertainty representation provided by evi-dential models, KLoS captures more than existing first-order uncertainty measures such as predic-tive entropy. We design an auxiliary neural network , KLoSNet, to learn a refined measure directly aligned with the evidential training objective. Experiments show that KLoSNet acts as a class-wise density estimator and outperforms current uncertainty measures in the realistic context where no OOD data is available during training. We also report comparisons in the presence of OOD training samples, which shed a new light on the impact of the vicinity of this data with OOD test data.
@article{Corbiere2021, author = {Corbière, Charles and Lafon, Marc and Thome, Nicolas and Cord, Matthieu and Pérez, Patrick}, journal = {ICML Workshop on Uncertainty and Robustness in Deep Learning}, title = {{Beyond First-Order Uncertainty Estimation with Evidential Models for Open-World Recognition}}, year = {2021}, }
arxiv
Understanding the Double Descent Phenomenon in Deep Learning

Marc Lafon, and Alexandre Thomas

2021

Abs Bib PDF

Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep learning practice, very large over-parameterized models (e.g. neural networks) are optimized to fit perfectly the training data and still obtain great generalization performance. Past the interpolation point, increasing model complexity seems to actually lower the test error. In this tutorial, we explain the concept of double descent introduced by Belkin et al 2019, and its mechanisms. Section 1 sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting, among the multiple interpolating solutions, a smooth empirical risk minimizer. Finally, section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
@article{Lafon2021, author = {Lafon, Marc and Thomas, Alexandre}, journal = {}, title = {{Understanding the Double Descent Phenomenon in Deep Learning}}, year = {2021}, }