CLOVER: A framework for benchmarking synthetic data generation methods balancing utility and privacy in healthcare

Yue Qi, Lorrie Herbault, Hadrien Lautraite, Michael, Katleen Blanchet, Christian Vincelette, Louis Mullie, Guillaume Dumas, Jean-François Rajotte, Kamran Afzali, Sébastien Gambs, Michaël Chassé

Artificial Intelligence in the Life Sciences

Publication year: 2026

External Link

Abstract

Background

Synthetic data enables open and efficient medical research by enhancing real-world data. We examined the utility-privacy trade-off of synthetic data generated with and without differential privacy (DP) using CLOVER, a novel open-source Python library that we have developed.

Methods

We generated synthetic datasets based on data from MIMIC-III (24 variables, n = 15,118) and eICU (23 variables, n = 3,726). The generative approaches used were SMOTE, DataSynthesizer, Synthpop, MST, CTGAN, TVAE, CTABGAN+, and FinDiff, with and without DP. We evaluated the utility and privacy of the generated datasets based on univariate, bivariate, and population fidelity; analysis-specific and distance-based metrics; and membership inference attacks (MIAs). We benchmarked the synthetic datasets using rank-derived scores for utility and privacy. We examined the impact of DP on machine learning (ML) performance and MIAs and analyzed the achievable utility-privacy trade-off by generating synthetic data across a range of privacy regimes. We compared computational resource usage across generators.

Findings

When fully relaxing DP constraints, MST (ε = 10⁵ and δ = 0·9999) ranked the most private on MIMIC-III and the second most private on eICU (DCR and NNDR well above baseline for both datasets, top 1% precision for MIAs below 0·53 and 0·62 for MIMIC-III and eICU, respectively) but placed 7th out of eight in utility for both datasets. Conversely, Synthpop ranked first in utility for both datasets. It achieved Hellinger distance of 0·88 × 10^-2 and 1·41 × 10^-2; pairwise correlation difference of 0·31 and 0·68; distinguishability of 0·02 × 10^-1 and 0·02 × 10^-1; AUC difference of 0·20 × 10^-1 and 0·10 × 10^-1 on classification task; and RMSE difference of 2·53 × 10^-2 and 13·64 × 10^-2 on regression tasks for MIMIC-III and eICU, respectively. However, it ranked 7th in privacy for both datasets. SMOTE and TVAE were each outperformed by at least one other generator in terms of both utility and privacy based on the rank-derived scores for both datasets. Once DP was introduced, utility decreased across all algorithms, with no method consistently outperforming others across all privacy regimes.

Interpretation

There is a tradeoff between utility and privacy in the non-DP setting. DP reduced utility but ensured a consistent level of privacy, allowing for a fair comparison of the utility of different generators. Selecting an appropriate generator depends on the privacy needs, intended use case and the user’s available resources.

Keywords

Synthetic data, Healthcare data, Differential privacy.

Guillaume Dumas

MEng, MSc, PhD, HDR