Smoothing algorithms by constrained maximum likelihood: methodologies and implementations for Comprehensive Capital Analysis and Review stress testing and International Financial Reporting Standard 9 expected credit loss estimation

Bill Huajian Yang

1 Introduction

Given a risk-rated portfolio with $k$ ratings $\{R_{i}\mid 1\leq i\leq k\}$ , we assume that rating $R_{1}$ is the best quality rating and $R_{k}$ is the worst, ie, the default rating. It is widely expected that higher risk ratings carry higher default risk, and that an entity is more likely to be downgraded or upgraded to a closer nondefault rating than a more distant nondefault rating. The following constraints are therefore required:

	$\displaystyle 0\leq p_{1}\leq p_{2}\leq\cdots\leq p_{k-1}\leq 1,$		(1.1)
	$\displaystyle p_{ii+1}\geq p_{ii+2}\geq\cdots\geq p_{ik-1},$		(1.2)
	$\displaystyle p_{i1}\leq p_{i2}\leq\cdots\leq p_{ii-1},$		(1.3)

where $p_{i}$ , $1\leq i\leq k-1$ , denotes the probability of default (PD) for rating $R_{i}$ , and $p_{ij}$ , $1\leq i,j\leq k-1$ , is the migration probability from a nondefault initial rating $R_{i}$ to a nondefault rating $R_{j}$ .

Estimates that satisfy the above monotonicity constraints are called smoothed estimates. Smoothed estimates are widely expected for rating-level PD and rating migration probability in the process of loan pricing, capital allocation, Comprehensive Capital Analysis and Review (CCAR) stress testing (Board of Governors of the Federal Reserve System 2016), modeling of PD term structure and International Financial Reporting Standard 9 expected credit loss (ECL) estimation (Ankarath et al 2010).

In practice, sample estimates for rating-level PD and rating migration probability do not always respect these monotonicity rules. This calls for smoothing approaches. Regression and interpolation methods have been widely used for this purpose. A common issue with these approaches is that the risk scale for the estimates is not fully justified, leading to a possible bias estimate for the credit loss.

In this paper, we propose smoothing algorithms based on constrained maximum likelihood (CML). These CML-smoothed estimates are optimal in the sense of constrained maximum likelihood, with a fair risk scale determined by constrained maximum likelihood, leading to a fair and more justified loss estimation. As shown by the empirical examples for rating-level PD in Section 2.3, the CML approach is more robust than the logistic and log-linear models, with quality being measured based on the resulting likelihood ratio, the predicted portfolio level PD and the impacted ECL.

This paper is organized as follows. In Section 2, we propose smoothing algorithms for smoothed rating-level PD, for the cases with and without default correlation. A smoothing algorithm for multinomial probability is proposed in Section 3. Empirical examples are given accordingly in Sections 2 and 3, and in Section 2 we benchmark the CML approach for rating-level PD with a logistic model proposed by Tasche (2013) and a log-linear model proposed by van der Burgt (2008). Section 4 concludes.

2 Smoothing rating-level probability of default

2.1 The proposed smoothing algorithm for rating-level PD assuming no default correlation

Cross-section or within-section default correlation may arise due to some commonly shared risk factors. In which case, we assume that the sample is at a point in time, given the commonly shared risk factors, and that defaults occur independently given the commonly shared risk factors.

Let $d_{i}$ and $(n_{i}-d_{i})$ be the observed default and nondefault frequencies, respectively, for a nondefault risk rating $R_{i}$ . Let $p_{i}$ denote the PD for an entity with a nondefault initial rating $R_{i}$ . With no default correlation, we can assume that the default frequency follows a binomial distribution. Then the sample loglikelihood is given by

$\mathrm{LL}=\sum_{i=1}^{k-1}[(n_{i}-d_{i})\log(1-p_{i})+d_{i}\log(p_{i})]$

(2.1)

up to a summand given by the logarithms of the related binomial coefficients, which are independent of $\{p_{i}\}$ . By taking the derivative of (2.1) with respect to $p_{i}$ and setting it to zero, we have

	$\displaystyle-\frac{(n_{i}-d_{i})}{1-p_{i}}+\frac{d_{i}}{p_{i}}=0,$
	$\displaystyle d_{i}(1-p_{i})=(n_{i}-d_{i})p_{i}\Rightarrow p_{i}=\frac{d_{i}}{% n_{i}}.$

Therefore, the unconstrained maximum likelihood estimate for $p_{i}$ is just the sample default rate $d_{i}/n_{i}$ .

We propose the following smoothing algorithm for the case when no default correlation is assumed.

Algorithm 2.1 (Smoothing rating-level PD assuming no default correlation).

(a)

Parameterize the PD for a nondefault rating $R_{i}$ by

$p_{i}=\exp(b_{1}+b_{2}+\cdots+b_{k-i}),$

(2.2)

where

$b_{k-1}\leq-\varepsilon_{1},\quad b_{k-2}\leq-\varepsilon_{2},\quad\dots,\quad b% _{2}\leq-\varepsilon_{k-2},\quad b_{1}\leq 0$

(2.3)

for given constants $\varepsilon_{i}\geq 0$ , $1\leq i\leq k-2$ .

(b)

Maximize, under constraint (2.3), the loglikelihood (2.1) for parameters $\{b_{1},b_{2},\dots,b_{k-1}\}$ . Derive the smoothed estimates using (2.2).

By (2.2) and (2.3), we have

	$\displaystyle p_{k-1}=\exp(b_{1})\leq\exp(0)=1,$
	$\displaystyle\frac{p_{i}}{p_{i-1}}=\exp(-b_{k-i+1})\geq\exp(\varepsilon_{i-1})% \geq 1\Rightarrow 0\leq p_{1}\leq p_{2}\leq\cdots\leq p_{k-1}\leq 1.$

Thus, monotonicity (1.1) is satisfied. When $\varepsilon_{1}=\varepsilon_{2}=\cdots=\varepsilon_{k-2}=\varepsilon\geq 0$ , let $\rho=\exp(\varepsilon)$ . Then $\rho$ is the maximum lower bound for all the ratios $\{p_{i}/p_{i-1}\}$ of the smoothed estimates $\{p_{i}\}$ .

2.2 The proposed smoothing algorithms for rating-level PD assuming default correlation

Default correlation can be modeled by the asymptotic single risk factor (ASRF) model using asset correlation. Under the ASRF model framework, the risk for an entity is governed by a latent random variable $z$ , called the firm’s normalized asset value, which splits into the following two parts (Miu and Ozdemir 2009):

$z=s\sqrt{\rho}+\varepsilon\sqrt{1-\rho},\quad 0<\rho<1,~{}s\sim N(0,1),~{}% \varepsilon\sim N(0,1),$

(2.4)

where $s$ denotes the common systematic risk and $\varepsilon$ is the idiosyncratic risk independent of $s$ . The quantity $\rho$ is called the asset correlation. It is assumed that there exist threshold values (ie, the default points) $\{b_{i}\}$ such that an entity with an initial risk rating $R_{i}$ will default when $z$ falls below the threshold value $b_{i}$ . The long-run PD for rating $R_{i}$ is then given by $p_{i}=\varPhi(b_{i})$ , where $\varPhi$ denotes the standard normal cumulative distribution function (CDF).

Let $p_{i}(s)$ denote the PD for an entity with an initial risk rating $R_{i}$ given the systematic risk $s$ . It is shown in Yang (2017) that

$p_{i}(s)=\varPhi(b_{i}\sqrt{1+r^{2}}-rs),$

(2.5)

where

$r=\frac{\sqrt{\rho}}{\sqrt{1-\rho}}.$

Let $n_{i}(t)$ and $d_{i}(t)$ denote, respectively, the number of entities and the number of defaults at time $t$ for $t=t_{1},t_{2},\dots,t_{q}$ . Given the latent factor $s$ , we propose the following smoothing algorithm for rating-level-correlated long-run PDs by using (2.5).

Algorithm 2.2 (Smoothing rating-level-correlated long-run PDs given the latent systematic risk factor).

(a)

Parameterize $p_{i}(s)$ for a nondefault rating $R_{i}$ by (2.5) with

$b_{i}=(c_{1}+c_{2}+\cdots+c_{k-i}),$

(2.6)

where, for a given constant $\varepsilon\geq 0$ , the following constraints are satisfied:

$c_{k-1}\leq-\varepsilon,\quad c_{k-2}\leq-\varepsilon,\quad\dots,\quad c_{2}% \leq-\varepsilon,\quad c_{1}\leq 0.$

(2.7)

(b)

Estimate parameters $\{c_{1},c_{2},\dots,c_{k-1}\}$ by maximizing, under constraint (2.7), the following loglikelihood:

$\mathrm{LL}=\sum_{h=1}^{q}\sum_{i=1}^{k-1}[(n_{i}(t_{h})-d_{i}(t_{h}))\log(1-p% _{i}(s)+d_{i}(t_{h}))\log(p_{i}(s))].$

(2.8)

Set $p_{i}=\varPhi(b_{i})$ . Then monotonicity (1.1) for $\{p_{i}\}$ , ie, the rating-level long-run PDs, follows from constraints (2.6) and (2.7).

Optimization with a random effect can be implemented by using, for example, SAS PROC NLMIXED (SAS Institute 2009).

When some key risk factors $x=(x_{1},x_{2},\dots,x_{m})$ , common to all ratings, are observed, we assume the following decomposition for the systematic risk factor $s$ :

$s=-\lambda\mathrm{ci}(x)-e\sqrt{1-{\lambda^{2}}},\quad e\sim N(0,1),~{}0<% \lambda<1,$

where the common index $\mathrm{ci}(x)=[a_{1}x_{1}+a_{2}x_{2}+\cdots+a_{m}x_{m}-u]/v$ is a linear combination of variables $x_{1},x_{2},\dots,x_{m}$ , with $u$ and $v$ being the mean and standard deviation of $a_{1}x_{1}+a_{2}x_{2}+\cdots+a_{m}x_{m}$ .

Let $p_{i}(x)$ denote the PD given a scenario $x$ . Assume that $\mathrm{ci}(x)$ is standard normal independent of $e$ . Then we have (Yang 2017, Theorem 2.2)

$p_{i}(x)=\varPhi[b_{i}\sqrt{1+\tilde{r}^{2}}+\tilde{r}\mathrm{ci}(x)]$

(2.9)

for some $\tilde{r}$ .

Let $\mathrm{ci}(x(t))$ denote the value of $\mathrm{ci}(x)$ at time $t$ for $t=t_{1},t_{2},\dots,t_{q}$ . Given $\mathrm{ci}(x)$ , we propose the following smoothing algorithm for rating-level-correlated long-run PDs and rating-level point-in-time PDs by using (2.9).

Algorithm 2.3 (Smoothing rating-level-correlated PDs given the common index $\mathrm{ci}(x)$ ).

(a)

Parameterize $p_{i}(x(t))$ for a nondefault rating $R_{i}$ by (2.6) with

$b_{i}=(c_{1}+c_{2}+\cdots+c_{k-i}),$

(2.10)

where, for a given constant $\varepsilon\geq 0$ , the following constraints are satisfied:

$c_{k-1}\leq-\varepsilon,\quad c_{k-2}\leq-\varepsilon,\quad\dots,\quad c_{2}% \leq-\varepsilon,\quad c_{1}\leq 0.$

(2.11)

(b)

Estimate parameters $\{c_{1},c_{2},\dots,c_{k-1}\}$ by maximizing, under constraint (2.11), the loglikelihood, as follows

$\mathrm{LL}=\sum_{h=1}^{q}\sum_{i=1}^{k-1}[(n_{i}(t_{h})-d_{i}(t_{h}))\log(1-p% _{i}(x(t_{h}))+d_{i}(t_{h})\log(p_{i}(x(t_{h}))))].$

(2.12)

Set $p_{i}=\varPhi(b_{i})$ . Then monotonicity (1.1) for $\{p_{i}\}$ , ie, the rating-level long-run PDs, and for $\{p_{i}(x(t_{h}))\}$ at time $t=t_{h}$ , follows from constraints (2.10) and (2.11).

2.3 Empirical examples: smoothing of rating-level PDs

Example 1: smoothing rating-level long-run PDs assuming no default correlation

Table 1 shows the record count and default rate (DF rate) for a sample created synthetically with six nondefault risk ratings.

Algorithm 2.1 will be benchmarked by the following methods.

Table 1: Sample count by rating.

	Risk rating
							Portfolio
	1	2	3	4	5	6	level
DF	1	11	22	124	62	170	391
Count	5 529	11 566	29 765	52 875	4 846	4 318	108 899
DF rate (%)	0.0173	0.0993	0.0739	0.2352	1.2833	3.9442	0.3594

LGL1:

with this approach, the PD for rating $R_{i}$ is estimated by $p_{i}=\exp(a+bx)$ , where $x$ denotes the index for rating $R_{i}$ , ie, $x=i$ for rating $R_{i}$ . Parameters $a$ and $b$ are estimated by a linear regression of the form below, using the logarithm of the sample default rate for a rating:

$\log(r_{i})=a+bx+e,\quad e\sim N(0,\sigma^{2}).$

A common issue with this approach is the unjustified uniform risk scale $b$ (in the log space) for all ratings. In addition, this approach generally causes the portfolio level PD to be underestimated, due to the convexity of the exponential function (the second derivative of the function $\exp(\cdot)$ is positive):

$E(y\mid x)=E(\exp(a+bx+e)\mid x)=\exp(a+bx+\tfrac{1}{2}\sigma^{2})>\exp(a+bx).$

LGL2:

like method LGL1, rating-level PD is estimated by $p_{i}=\exp(a+bx)$ . However, parameters $a$ and $b$ are estimated by maximizing the loglikelihood given in (2.1). With this approach, the bias for portfolio PD can generally be avoided, though the issue with the unjustified uniform risk scale remains.

EXP-CDF:

this method was proposed by van der Burgt (2008). With this approach, the rating-level PD is estimated by $p_{i}=\exp(a+bx)$ , where $x$ denotes, for rating $R_{i}$ , the adjusted sample cumulative distribution,

$x(i)=\frac{(n_{1}+n_{2}+\cdots+n_{i-1}+\tfrac{1}{2}n_{i})}{(n_{1}+n_{2}+\cdots% +n_{k-1})}.$

(2.13)

Instead of estimating parameters via a cap ratio (van der Burgt 2008), we estimate parameters by maximizing the loglikelihood given in (2.1).

LGST-INVCDF:

this method was proposed by Tasche (2013). With this approach, the rating-level PD is estimated by using $p_{i}=1/(1+\exp(a+b\varPhi^{-1}(x)))$ , where $x$ is as in (2.13). Parameters are estimated by maximizing the loglikelihood given in (2.1).

Estimation quality is measured by the following.

$p$ -value:: this is the $p$ -value calculated from the likelihood ratio chi-squared test with degrees of freedom equal to the number of restrictions. A higher $p$ -value indicates a better model.
ECL ratio:: this is the ratio of expected credit loss based on the smoothed rating-level PDs to that based on the realized rating-level PDs, given the exposure at default and loss given default parameters for each rating. A significantly lower ECL ratio value indicates a possible underestimation of the credit loss.
PD ratio:: the ratio of the portfolio level PD aggregated from the smoothed rating-level PDs is relative to the portfolio level PD aggregated from the realized rating-level PDs. A value significantly lower than 100% for the PD ratio indicates a possible underestimation for the PD at portfolio level.

Table 2 shows the results for Algorithm 2.1 (labeled “CML”) when $\varepsilon_{1}=\varepsilon_{2}=\cdots=\varepsilon_{k-2}=0$ along with the benchmarks, where the smoothed rating-level PDs are listed in columns P1–P6.

Table 2: Smoothed results by Algorithm 2.1 and benchmarks. [All values are given in percent.]

Method	P1	P2	P3	P4	P5	P6	$?$ -value	ratio	ratio
								Portfolio level

								ECL	PD
CML	0.0173	0.0810	0.0810	0.2352	1.2833	3.9442	95.92	99.91	100.00
LGL1	0.0165	0.0416	0.1053	0.2663	0.6732	1.7022	00.00	46.09	072.57
LGL2	0.0032	0.1468	0.2901	0.4333	0.5763	0.7191	00.00	27.58	100.07
EXP-CDF	0.0061	0.0086	0.0294	0.3431	1.9081	2.5057	00.00	72.92	100.21
LGST-INVCDF	0.0104	0.0188	0.0585	0.2795	1.5457	3.4388	00.00	90.46	100.00

Table 3: Strictly monotonic smoothed rating-level PDs. [All values are given in percent.]

0.0	0.0173	0.0810	0.0810	0.2352	1.2833	3.9442	95.92	99.91	100.00
								Portfolio level

								ECL	PD
$\varepsilon$	P1	P2	P3	P4	P5	P6	$?$ -value	ratio	ratio
0.1	0.0173	0.0753	0.0832	0.2352	1.2833	3.9442	89.06	99.88	100.00
0.5	0.0173	0.0552	0.0910	0.2352	1.2833	3.9442	36.63	99.79	100.00
1.0	0.0120	0.0327	0.0890	0.2419	1.2833	3.9442	02.54	99.63	100.00

These results show that Algorithm 2.1 outperforms the other benchmarks significantly by $p$ -value, impacted ECL and aggregated portfolio-level PD. The first log-linear model (LGL1) underestimates the portfolio level PD significantly. All log-linear models (LGL1, LGL2 and EXP-CDF) underestimate the ECL significantly.

Table 3 illustrates the strictly monotonic smoothed rating-level PDs by Algorithm 2.1 when $\varepsilon_{1}=\varepsilon_{2}=\cdots=\varepsilon_{k-2}=\varepsilon>0$ . However, while the $p$ -value deteriorates quickly as $\varepsilon$ increases from 0 to 1, the impacted ECL does not change that much.

Example 2: smoothing rating-level long-run PDs in the presence of default correlation

Table 4: Long-run default rate by rating calculated from the sample. [All values are given in percent.]

	Risk rating
							Portfolio
	1	2	3	4	5	6	level
Long-run AVG PD	0.0215	0.1027	0.0764	0.2731	1.1986	3.8563	0.3818
Overall distribution	5.07	10.61	27.47	48.32	4.52	4.01	100.00

The sample created synthetically contains the quarterly default count by rating for a portfolio with six nondefault ratings between 2005 Q1 and 2014 Q4. The (rating-level or portfolio-level) point-in-time default rate is calculated for each quarter and then averaged over the sample window by dividing by the number of quarters (forty-four) to obtain the estimate for the long-run average realized PD (labeled “AVG PD”). Sample distribution (labeled “overall distribution”) by rating is calculated by combining all forty-four quarters. Table 4 displays sample statistics (with a heavy size concentration at rating $R_{4}$ ).

Table 5: Smoothed correlated long-run rating-level PDs. [All values are given in percent.]

								Portfolio
								long-run PD

								AVG	PD
$\varepsilon$	P1	P2	P3	P4	P5	P6	AIC	PD	ratio
0.0 (no correl)	0.0179	0.0836	0.0836	0.2371	1.3076	4.0372	694.02	0.3710	097.17
0.0 ( $w$ correl)	0.0183	0.0828	0.0828	0.2545	1.1951	3.9340	594.62	0.3843	100.66
0.1 ( $w$ correl)	0.0183	0.0483	0.0966	0.2541	1.1942	3.9318	600.79	0.3842	100.64
0.2 ( $w$ correl)	0.0035	0.0176	0.0754	0.2775	1.1859	3.9237	617.96	0.3842	100.64
0.3 ( $w$ correl)	0.0010	0.0086	0.0560	0.2905	1.1961	3.9342	637.25	0.3845	100.71

Table 5 shows the smoothed correlated rating-level long-run PD for all six nondefault ratings obtained by using Algorithm 2.2.

Estimation quality is measured by the following.

AIC:: the Akaike information criterion. A lower AIC indicates a better model.
PD ratio:: the ratio of the long-run average predicted portfolio-level PD (labeled “AVG PD”) to the long-run average realized portfolio level PD. A value significantly less than 100% for this ratio indicates a possible underestimation for the PD at portfolio level.

The first row in Table 5 shows results for the case when no default correlation is assumed (labeled “no correl”) and $\varepsilon$ is chosen to be 0, while the second row shows those for the case when default correlation is assumed (labeled “ $w$ correl”) and $\varepsilon=0$ .

The results in the first row show that the estimated long-run portfolio level PD for the case assuming no default correlation is lower than that for the case when default correlation is assumed (second row), which suggests we may have underestimated the long-run rating-level PD when assuming no default correlation. The high AIC value in the first row implies that the assumption of no default correlation may not be appropriate.

Note that, when applying Algorithm 2.2 to the sample used in example 1, assuming no default correlation, we got exactly the same estimates as in example 1.

3 Smoothing algorithms for multinomial probability

3.1 Unconstrained maximum likelihood estimates for multinomial probability

For $n$ independent trials, where each trial results in exactly one of $h$ fixed outcomes, the probability of observing frequencies $\{n_{i}\}$ , with frequency $n_{i}$ for the $i$ th ordinal outcome, is

$\frac{n!}{n_{1}!n_{2}!\cdots n_{h}!}x_{1}^{n_{1}}x_{2}^{n_{2}}\cdots x_{h}^{n_% {h}},$

(3.1)

where $x_{i}>0$ is the probability of observing the $i$ th ordinal outcome in a single trial, and

$n=n_{1}+n_{2}+\cdots+n_{h},\qquad x_{1}+x_{2}+\cdots+x_{h}=1.$

The loglikelihood is

$\mathrm{LL}=n_{1}\log x_{1}+n_{2}\log x_{2}+\cdots+n_{h}\log x_{h}$

(3.2)

up to a constant given by the logarithm of some multinomial coefficient independent of parameters $\{x_{1},x_{2},\dots,x_{h}\}$ . By using the relation $x_{h}=1-x_{1}-x_{2}-\cdots-x_{h-1}$ and setting to zero the derivative of (3.2) with respect to $x_{i}$ , $1\leq i\leq h-1$ , we have

$\frac{n_{i}}{x_{i}}-\frac{n_{h}}{(1-x_{1}-x_{2}-\cdots-x_{h-1})}=0\Rightarrow% \frac{n_{i}}{x_{i}}=\frac{n_{h}}{x_{h}}.$

Since this holds for each $i$ and for the fixed $h$ , we conclude that the vector $(x_{1},x_{2},\dots,x_{h})$ is in proportion with $(n_{1},n_{2},\dots,n_{h})$ . Thus, the maximum likelihood estimate for $x_{i}$ is the sample estimate

$x_{i}=\frac{n_{i}}{(n_{1}+n_{2}+\cdots+n_{h})}=\frac{n_{i}}{n}.$

(3.3)

3.2 The proposed smoothing algorithm for multinomial probability

We next propose a smoothing algorithm for multinomial probability under the following constraint:

$0\leq x_{1}\leq x_{2}\leq\cdots\leq x_{h}\leq 1.$

(3.4)

Algorithm 3.1 (Smoothing multinomial probability).

(a)

Parameterize the multinomial probability by

$x_{i}=\frac{\exp(b_{1}+b_{2}+\cdots+{b_{h+1-i}})}{\exp(b_{1})+\exp(b_{1}+b_{2}% )+\cdots+\exp(b_{1}+b_{2}+\cdots+b_{h})}.$

(3.5)

(b)

Maximize (3.2), with $x_{i}$ given by (3.5), for parameters $b_{1},b_{2},\dots,b_{h}$ subject to

$b_{h}\leq-\varepsilon_{1},\quad b_{h-1}\leq-\varepsilon_{2},\quad\dots,\quad b% _{2}\leq-\varepsilon_{h-1},\quad b_{1}\leq 0$

(3.6)

for $\varepsilon_{i}\geq 0$ , $1\leq i\leq h-1$ . Derive the CML-smoothed estimates by using (3.5). Then the monotonicity (3.4) for the estimates follows from (3.5) and (3.6).

In the case when $\varepsilon_{1}=\varepsilon_{2}=\cdots=\varepsilon_{h-1}=\varepsilon\geq 0$ , let $\rho=\exp(\varepsilon)$ . Then $\rho$ is the maximum lower bound for all the ratios $\{x_{i}/x_{i-1}\}$ .

3.3 An empirical example: smoothing transition probability matrix

Table 6: Long-run transition probability matrixes before and after smoothing.


(a) Transition probability before smoothing
p1	p2	p3	p4	p5	p6	p7
0.97162	0.01835	0.00312	0.00554	0.00104	0.00017	0.00017
0.00621	0.94528	0.03071	0.01284	0.00215	0.00257	0.00025
0.00071	0.01028	0.93803	0.04089	0.00659	0.00277	0.00074
0.00024	0.00069	0.01260	0.96726	0.01261	0.00543	0.00118
0.00039	0.00118	0.00790	0.07996	0.82725	0.07048	0.01283
0.00022	0.00133	0.00266	0.04498	0.01197	0.89940	0.03944

(b) Transition probability after smoothing
p1	p2	p3	p4	p5	p6	p7
0.97162	0.01835	0.00433	0.00433	0.00104	0.00017	0.00017
0.00621	0.94528	0.03071	0.01284	0.00236	0.00236	0.00025
0.00071	0.01028	0.93803	0.04089	0.00659	0.00277	0.00074
0.00024	0.00069	0.01260	0.96726	0.01261	0.00543	0.00118
0.00039	0.00118	0.00790	0.07996	0.82725	0.07048	0.01283
0.00022	0.00133	0.00266	0.02847	0.02847	0.89940	0.03944

Rating migration matrix models (Miu and Ozdemir 2009; Yang and Du 2016) are widely used for International Financial Reporting Standard 9 ECL estimation and CCAR stress testing. Given a nondefault risk rating $R_{i}$ , let $n_{ij}$ be the observed long-run transition frequency from $R_{i}$ to $R_{j}$ at the end of the horizon, and let $n_{i}=n_{i1}+n_{i2}+\cdots+n_{ik}$ . Let $p_{ij}$ be the long-run transition probability from $R_{i}$ to $R_{j}$ . By (3.3), the maximum likelihood estimate for $p_{ij}$ observing the long-run transition frequencies $\{n_{ij}\}$ for a fixed $i$ is

$p_{ij}=\frac{n_{ij}}{n_{i}}.$

(3.7)

It is widely expected that higher risk grades carry greater default risk, and that an entity is more likely to be downgraded or upgraded to a closer nondefault rating than a more distant nondefault rating. The following constraints are thus required:

	$\displaystyle p_{ii+1}\geq p_{ii+2}\geq\cdots\geq p_{ik-1},$		(3.8)
	$\displaystyle p_{i1}\leq p_{i2}\leq\cdots\leq p_{ii-1},$		(3.9)
	$\displaystyle p_{1k}\leq p_{2k}\leq\cdots\leq p_{k-1k}.$		(3.10)

The constraint (3.10) is for rating-level PD, which was discussed in Section 2.

Smoothing the long-run migration matrix involves the following steps.

(a)

Rescale migration probabilities $\{p_{i1},p_{i2},\dots,p_{ii-1}\}$ in (3.9) to make them sum to 1. Then find the CML-smoothed estimates by using Algorithm 3.1, and rescale these CML estimates in return to obtain the same summed value for $\{p_{i1},p_{i2},\dots,p_{ii-1}\}$ as that before smoothing. Do the same for (3.8).
(b)

Find the CML-smoothed estimates by using Algorithm 2.1 for the rating-level default rate. Keep these CML default rate estimates unchanged and rescale, for each nondefault rating $R_{i}$ , the nondefault migration probabilities $\{p_{i1},p_{i2},\dots,p_{ik-1}\}$ so that the entire row $\{p_{i1},p_{i2},\dots,p_{ik}\}$ sums to 1.

Table 6 shows empirical results using Algorithms 2.1 and 3.1 for smoothing the long-run migration matrix, where for Algorithm 3.1 all $\varepsilon_{i}$ are set to zero.

The sample used here is created synthetically. It consists of the historical quarterly rating transition frequency for a commercial portfolio from 2005 Q1 to 2015 Q4. There are seven risk ratings, with $R_{1}$ being the best quality rating and $R_{7}$ being the default rating.

Part (a) shows sample estimates for long-run transition probabilities before smoothing, while part (b) shows CML-smoothed estimates. There are three rows, as highlighted in bold in part (a), where sample estimates violate (3.8) or (3.9) (but (3.10) is satisfied). Rating-level sample default rates (the column labeled “p7”) do not require smoothing.

As shown in the table, the CML-smoothed estimates are the simple average of the relevant nonmonotonic sample estimates. (For the structure of CML-smoothed estimates for multinomial probabilities, we show theoretically in a separate paper that the CML-smoothed estimate for an ordinal class is either the sample estimate or the simple average of the sample estimates for some consecutive ordinal classes including the named class.)

4 Conclusions

Regression and interpolation approaches are widely used for smoothing rating transition probability and rating-level probability of default. A common issue with these methods is that the risk scale for the estimates does not have a strong mathematical basis, leading to possible bias in credit loss estimation. In this paper, we propose smoothing algorithms that are based on constrained maximum likelihood for rating-level PD and for rating migration probability. These smoothed estimates are optimal in the sense of constrained maximum likelihood, with a fair risk scale determined by constrained maximum likelihood, leading to a fair and more justified credit loss estimation. These algorithms can be implemented by a modeler using, for example, the SAS PROC NLMIXED package.

Declaration of interest

The author reports no conflicts of interest. The author alone is responsible for the content and writing of the paper. The views expressed in this paper are not necessarily those of the Royal Bank of Canada or any of its affiliates.

Acknowledgements

The author thanks both referees for suggesting extended discussion to cover both the case when default correlation is assumed and the likelihood ratio test for the constrained maximum likelihood estimates. Special thanks to Carlos Lopez for his consistent input, insights and support for this research. Thanks also go to Clovis Sukam and Biao Wu for their critical reading of this manuscript, and Zunwei Du, Wallace Law, Glenn Fei, Kaijie Cui, Jacky Bai and Guangzhi Zhao for many valuable conversations.

References

Ankarath, N., Ghost, T. P., Mehta, K. J., and Alkafaji, Y. A. (2010). Understanding IFRS Fundamentals. Wiley.
Board of Governors of the Federal Reserve System (2016). Comprehensive Capital Analysis and Review 2016: summary instructions. Report, January, Federal Reserve Bank.
Miu, P., and Ozdemir, B. (2009). Stress testing probability of default and rating migration rate with respect to Basel II requirements. The Journal of Risk Model Validation 3(4), 3–38 (https://doi.org/10.21314/JRMV.2009.048).
SAS Institute (2009). SAS 9.2 user’s guide: the NLMIXED procedure. SAS Institute Inc., Cary, NC.
Tasche, D. (2013). The art of probability-of-default curve calibration. The Journal of Credit Risk 9(4), 63–103 (https://doi.org/10.21314/JCR.2013.169).
van der Burgt, M. J. (2008), Calibrating low-default portfolios, using the cumulative accuracy profile. The Journal of Risk Model Validation 1(4), 17–33 (https://doi.org/10.21314/JRMV.2008.016).
Yang, B. H. (2017). Point-in-time probability of default term structure models for multiperiod scenario loss projection. The Journal of Risk Model Validation 11(1), 73–94 (https://doi.org/10.21314/JRMV.2017.164).
Yang, B. H., and Du, Z. (2016). Rating-transition-probability models and Comprehensive Capital Analysis and Review stress testing. The Journal of Risk Model Validation 10(3), 1–19 (https://doi.org/10.21314/JRMV.2016.155).