Estimating Probability of Default & Expected Loss with Few Default Events in Your Data
Last week, I was speaking to a group of mid-sized Farm Credit banks about their struggles modeling default (e.g., PD models, scorecards, CECL, ALLL, etc.) while having very few loss events in their historical data. After hearing their experiences, I became concerned that most of the vendor product offerings are intended for commercial banking applications where there is greater loss history available. I advocated for the use of Bayesian and probabilistic modeling approaches for predicting default in situations where loss data is sparse. If either of those two terms sound scary, I promise to de-mystify them in just a minute... so hang on!
[Quick note: I'm going to use the terms "Bayesian" & "probabilistic" interchangeably throughout the rest of this post to keep things at a high level.]
Defining Risk & Probability Distributions
Most of the audience I was speaking to last week work in their organization's Risk Management department. By definition, the word "risk" means that there are multiple possible outcomes, and each outcome has some probability of occurring. A simple example of risk is a jar filled with 40 white marbles and 60 black marbles. If we pick a marble out of the jar without looking, we know that there are two possible outcomes: (1) we pick a white marble, or (2) we pick a black marble. Furthermore, the probability of picking a white marble is 40% (40 out of 100) and the probability of picking a black marble is 60% (60 out of 100). In other words, we have a distribution of (two) possible outcomes, where each outcome has its own probability of occurring.
We can extend this concept to the amount of an individual loan that we expect to get charged off. Instead of knowing that the outcome is either a white or a black marble, we know that the outcome (amount charged off) for a $50,000 loan will be a dollar amount between $0 and $50,000. In other words, we have a distribution of possible outcomes between $0 and $50,000, where each outcome has it's own probability of occurring. We would hope that $0 would have the highest probability, and the probability would decrease towards $50,000; perhaps like this:
The Limitations of Small Data
In traditional frequentist statistics, we tend to treat our data as if it represents the entirety of how the world behaves. This works great when we have a ton of data and can reasonably say that the data we have represents all possible future outcomes. In Bayesian statistics, we treat our data as if it is just a sample of how the world works.
To build a Bayesian model, we first have to specify a probability distribution of how we believe the system works, then we add in the data we have collected in our database. The resulting output from the model is an "updated" probability distribution.
Prior Beliefs + Evidence (i.e., Data) = Posterior Beliefs
This is really a beautiful methodology, because it lines up with how we as humans build our intuition: we start out with some prior beliefs, and over time we update our understanding based upon our actual experiences.
Let's take the simple model X + Y = Z. When we think of traditional frequentist statistics, X, Y, and Z are single values. These are the models that output a single number at the end.
In Bayesian and probabilistic modeling, X, Y, and the output Z are all probability distributions.
Why You Haven't Heard of Bayes
While probabilistic modeling isn't a new concept, the ability to actually compute these calculations is fairly recent. Equations containing numbers are quick for a computer to solve; equations containing entire probability distributions are more intensive. A decade ago, we simply didn't have the horsepower on a laptop to build probabilistic models. However, thanks to a lot of work that has been put into this field in the last decade, we now have the tools to build these models quickly and efficiently.
The insurance world has been a leader in using Bayesian techniques since the computational ability has existed -- especially in catastrophe or terrorism insurance, where the loss events are rare and our models need to incorporate the possibility for loss events worse than what we have in our data. For banks with little credit loss history, this is a perfect opportunity to borrow some of these modeling techniques from our friends in the insurance world.
A Use Case: CECL
The vendor models for CECL that we have seen are outputting a single number for total Expected Credit Loss, which is equivalent to saying, "Here is the one and only outcome we expect to happen." This doesn't make sense given that CECL is supposed to be intended to help banks manage risk by ensuring they have the right amount of capital to cover expected future losses. We know that there are many possible outcomes (dollar amounts) of how much credit loss we could incur, and each dollar amount has its own probability of occurring. The distribution might look something like this:
The peak of the distribution (where the most likely outcomes reside) is closer to $0, but there is a long tail that could extend to our entire portfolio of assets.
Why Probability Distributions are Awesome
Taking a probabilistic modeling approach allows us much greater flexibility downstream. When our models output a distribution like the one above, we can then calculate the following:
Given a probability (confidence threshold), what is the maximum loss amount we can expect to see?
E.g., we are 95% confident that the loss will be less than or equal to $427.6 million
Given a loss amount, what is the probability that the loss will be less than or equal to that amount? E.g., we are 82.6% confident that the loss will be less than or equal to $250 million.
Not only does a probabilistic modeling approach truly help you quantify risk, but it also allows you to justify previous "qualitative adjustments" in a much more straightforward and mathematically-backed way.
Interested in Learning More?
If any of the content in this post speaks to you and your organization, please reach out to us! We would love to hear from you, and help work through any similar challenges your team is facing.
Comments