A method of classifying imbalanced credit data based on the AC-CTGAN hybrid sampling algorithm

Tinggui Chen; Hailian Gu; Zhiyu Yang

Save this article

Need to know

The proposed method can address the between-imbalance and alleviate the within-class imbalance of credit data.
The oversampling weight of minority class subclusters is determined by the local density.
The undersampling method can overcome the potential drawbacks of the original conditional tabular generative adversarial networks.
The risk identification model processed by the adaptive cluster mixed sampling based on conditional tabular generative adversarial networks has a stronger generalization ability.

Abstract

The rapid growth of consumer credit services has heightened financial institutions’ need for enhanced risk management capabilities, as they strive to satisfy individuals’ various consumption preferences. Identifying personal credit risk is crucial in financial risk management, underscoring the importance of financial institutions developing a systematic and effective credit risk identification framework to mitigate the likelihood of credit defaults. To address the class imbalance of credit data, this paper starts at the data level and proposes the method of adaptive cluster mixed sampling based on conditional tabular generative adversarial networks (AC-CTGAN). The method first uses the edited nearest neighbors algorithm (ENN) for preliminary denoising of the original credit data, then employs the improved K-means algorithm to obtain multiple subclusters of the minority samples. The local density of each subcluster is calculated, and the oversampling weight of each subcluster is adaptively determined on the basis of the size of the local density. Finally, minority samples are generated via the CTGAN, and the decision boundary is clarified via the TomekLink algorithm. Comparative experimental results show that the minority class samples generated by the AC-CTGAN algorithm can realistically reflect the distribution of the original data, minimize the appearance of class-overlapping and limit the introduction of new noisy data, which increases sample diversity. The potential within-class imbalance of credit data is also somewhat alleviated. The risk-identification models trained on credit data processed by the AC-CTGAN algorithm have a greater generalization ability compared with the synthetic minority oversampling technique (SMOTE), SMOTE variants and the original CTGAN.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@risk.net

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@risk.net