Journal of Risk Model Validation

Risk.net

Analyzing credit risk model problems through natural language processing-based clustering and machine learning: insights from validation reports

Szymon Lis, Mariusz Kubkowski, Olimpia Borkowska, Dobromił Serwa and Jarosław Kuparnik

  • We apply clustering and machine learning techniques to analyze validation reports.
  • The XGBoost model outperforms Logistic regression and clustering methods in predicting dimensions of findings from the reports.
  • The addition of sentence embeddings, representing the textual context of findings into the predictive factors, significantly improved the overall predictive power.
  • The differing precision of the final model in predicting respective dimensions suggests challenges in distinguishing certain deficiencies in findings of analysed reports.

This paper employs clustering and machine learning techniques to analyze validation reports. It provides insights into issues related to credit risk model development, implementation and maintenance. Natural language processing is used in the study to classify issues based on findings raised in validation reports. A total of 657 findings, which are raised for selected credit risk models in a large banking institution between 2019 and 2022, are grouped into nine categories representing different validation dimensions. Next, sentence embedding generation from titles and descriptions of findings is used to create predictors in classification models of the validation dimensions. Several clustering methods are compared in order to group similar findings, effectively identifying common issues in each category with an accuracy level of more than 60%. Further, machine learning algorithms, such as logistic regression and extreme gradient boosting (XGBoost), are employed to forecast the finding’s category, with XGBoost achieving 80% accuracy. The top 10 predictive words for each category are also determined.

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Risk.net? View our subscription options

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here