Journal of Credit Risk

Risk.net

A method of classifying imbalanced credit data based on the AC-CTGAN hybrid sampling algorithm

Tinggui Chen, Hailian Gu, Zhiyu Yang, Jianjun Yang and Bing Wang

  • The proposed method can address the between-imbalance and alleviate the within-class imbalance of credit data.
  • The oversampling weight of minority class subclusters is determined by the local density.
  • The undersampling method can overcome the potential drawbacks of the original conditional tabular generative adversarial networks.
  • The risk identification model processed by the adaptive cluster mixed sampling based on conditional tabular generative adversarial networks has a stronger generalization ability.

The rapid growth of consumer credit services has heightened financial institutions’ need for enhanced risk management capabilities, as they strive to satisfy individuals’ various consumption preferences. Identifying personal credit risk is crucial in financial risk management, underscoring the importance of financial institutions developing a systematic and effective credit risk identification framework to mitigate the likelihood of credit defaults. To address the class imbalance of credit data, this paper starts at the data level and proposes the method of adaptive cluster mixed sampling based on conditional tabular generative adversarial networks (AC-CTGAN). The method first uses the edited nearest neighbors algorithm (ENN) for preliminary denoising of the original credit data, then employs the improved K-means algorithm to obtain multiple subclusters of the minority samples. The local density of each subcluster is calculated, and the oversampling weight of each subcluster is adaptively determined on the basis of the size of the local density. Finally, minority samples are generated via the CTGAN, and the decision boundary is clarified via the TomekLink algorithm. Comparative experimental results show that the minority class samples generated by the AC-CTGAN algorithm can realistically reflect the distribution of the original data, minimize the appearance of class-overlapping and limit the introduction of new noisy data, which increases sample diversity. The potential within-class imbalance of credit data is also somewhat alleviated. The risk-identification models trained on credit data processed by the AC-CTGAN algorithm have a greater generalization ability compared with the synthetic minority oversampling technique (SMOTE), SMOTE variants and the original CTGAN.

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Risk.net? View our subscription options

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here