Naive modeling of credit defaults using a Markov Random Field

Creative Commons License

aGrUM

interactive online version

This notebook is the adaptation to pyAgrum of the model proposed by Gautier Marti which is itself inspired by the paper Graphical Models for Correlated Defaults (adaptation by Marvin Lasserre and Pierre-Henri Wuillemin).

In [1]:
import numpy as np
import matplotlib.pyplot as plt

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.mrf2graph as m2g

Constructing the model

Three sectors co-dependent in their stress state are modelled: (Finance, Energy, Retail). For each of these sectors, we consider a universe of three issuers.

Within Finance

Within Energy

Within Retail

Deutsche Bank

EDF

New Look

Metro Bank

Petrobras

Matalan

Barclays

EnQuest

Marks & Spencer

The probability of default of these issuers is partly idiosyncratic and partly depending on the stress within their respective sectors. Some distressed names such as New Look can have a high marginal probability of default, so high that the state of their sector (normal or stressed) does not matter much. Some other high-yield risky names such as Petrobras may strongly depend on how the whole energy sector is doing. On the other hand, a company such as EDF should be quite robust, and would require a very acute and persistent sector stress to move its default probability significantly higher.

We can encode this basic knowledge in terms of potentials:

image-2.png

Conventions:

Sector variables can be either in the state no stress or in the state under stress while issuer variables can be either in the state no default or in the state default.

The choice of these potentials is not an easy task; It can be driven by historical fundamental or statistical relationships or forward-looking views.

Given a network structure, and a set of potentials, one can mechanically do inference using the pyAgrum library.

We will show how to obtain a distribution of the number of defaults (and the expected number of defaults), a distribution of the losses (and the expected loss). From the joint probability table, we could also extract the default correlation

tl;dr We use here the library pyAgrum to illustrate, on a toy example, how one can use a Markov Random Field (MRF) for modelling the distribution of defaults and losses in a credit portfolio.

Building accurate PGM/MRF models require expert knowledge for considering a relevant graph structure, and attributing useful potentials. For this toy model, we use the following structure:

image.png

The first step consists in creating the model. For that, we use the pyAgrum class pyAgrum.MarkovRandomField.

In [2]:
# building nodes (with types)
mn = gum.MarkovRandomField('Credit default modeling')

# Adding sector variables
sectors = ['Finance', 'Energy', 'Retail']
finance, energy, retail = [mn.add(gum.LabelizedVariable(name, '', ['no stress', 'under stress']))
                           for name in sectors]

# Adding issuer variables
edf, petro, eq = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['EDF','Petrobras','EnQuest']]

mb, db, ba = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['Metro Bank', 'Deutsche Bank', 'Barclays']]

nl, matalan, ms = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['New Look', 'Matalan', 'Marks & Spencer']]
In [3]:
# Adding and filling factors between sectors
mn.addFactor([finance, energy])[:] = [[90, 70],
                                      [60, 10]]
mn.addFactor([finance, retail])[:] = [[80, 10],
                                      [30, 80]]
mn.addFactor([energy, retail])[:]  = [[60,  5],
                                      [70, 95]]

# Adding factors between sector and issuer
mn.addFactor([db, finance])[:] = [[90, 30],
                                  [10, 60]]
mn.addFactor([mb, finance])[:] = [[80, 40],
                                  [ 5, 60]]
mn.addFactor([ba, finance])[:] = [[90, 20],
                                  [20, 50]]

mn.addFactor([edf, energy])[:]   = [[90, 5],
                                    [80, 40]]
mn.addFactor([petro, energy])[:] = [[60, 50],
                                    [ 5, 60]]
mn.addFactor([eq, energy])[:]    = [[80, 20],
                                    [10, 50]]


mn.addFactor([nl, retail])[:]      = [[ 5, 60],
                                      [ 2, 90]]
mn.addFactor([matalan, retail])[:] = [[40, 30],
                                      [20, 70]]
mn.addFactor([ms, retail])[:]       = [[80, 10],
                                      [30, 50]]
In [4]:
nodetypes={n:0.1 if n in sectors else
             0.2 for n in mn.names()}

gnb.sideBySide(gnb.getMRF(mn,view='graph',nodeColor=nodetypes),
               gnb.getMRF(mn,view='factorgraph',nodeColor=nodetypes),
              captions=['The model as a Markov random field','The model as a factor graph'])

gnb.sideBySide(mn.factor({'Energy', 'Finance'}),
               mn.factor({'Finance', 'Retail'}),
               mn.factor({'Energy', 'Retail'}),
               mn.factor({'Deutsche Bank', 'Finance'}),
               mn.factor({'Metro Bank', 'Finance'}),
               mn.factor({'Barclays', 'Finance'}),
               mn.factor({'EDF', 'Energy'}),
               mn.factor({'Petrobras', 'Energy'}),
               mn.factor({'EnQuest', 'Energy'}),
               mn.factor({'New Look', 'Retail'}),
               mn.factor({'Matalan', 'Retail'}),
               mn.factor({'Marks & Spencer', 'Retail'}),
              ncols=3)
G Barclays Barclays Retail Retail Matalan Matalan Retail--Matalan New Look New Look Retail--New Look Marks & Spencer Marks & Spencer Retail--Marks & Spencer Petrobras Petrobras Finance Finance Finance--Barclays Finance--Retail Energy Energy Finance--Energy Deutsche Bank Deutsche Bank Finance--Deutsche Bank Metro Bank Metro Bank Finance--Metro Bank EDF EDF Energy--Retail Energy--Petrobras Energy--EDF EnQuest EnQuest Energy--EnQuest
The model as a Markov random field
G Barclays Barclays Retail Retail Petrobras Petrobras Matalan Matalan Finance Finance EDF EDF Energy Energy New Look New Look Marks & Spencer Marks & Spencer Deutsche Bank Deutsche Bank Metro Bank Metro Bank EnQuest EnQuest f0#1 f0#1--Finance f0#1--Energy f1#2 f1#2--Retail f1#2--Energy f0#7 f0#7--Finance f0#7--Deutsche Bank f1#4 f1#4--Petrobras f1#4--Energy f2#9 f2#9--Retail f2#9--New Look f2#11 f2#11--Retail f2#11--Marks & Spencer f0#2 f0#2--Retail f0#2--Finance f0#6 f0#6--Finance f0#6--Metro Bank f0#8 f0#8--Barclays f0#8--Finance f1#3 f1#3--EDF f1#3--Energy f1#5 f1#5--Energy f1#5--EnQuest f2#10 f2#10--Retail f2#10--Matalan
The model as a factor graph
Finance
Energy
no stress
under stress
no stress
90.000070.0000
under stress
60.000010.0000
Finance
Retail
no stress
under stress
no stress
80.000010.0000
under stress
30.000080.0000
Energy
Retail
no stress
under stress
no stress
60.00005.0000
under stress
70.000095.0000
Deutsche Bank
Finance
no default
default
no stress
90.000030.0000
under stress
10.000060.0000
Metro Bank
Finance
no default
default
no stress
80.000040.0000
under stress
5.000060.0000
Barclays
Finance
no default
default
no stress
90.000020.0000
under stress
20.000050.0000
EDF
Energy
no default
default
no stress
90.00005.0000
under stress
80.000040.0000
Petrobras
Energy
no default
default
no stress
60.000050.0000
under stress
5.000060.0000
EnQuest
Energy
no default
default
no stress
80.000020.0000
under stress
10.000050.0000
New Look
Retail
no default
default
no stress
5.000060.0000
under stress
2.000090.0000
Matalan
Retail
no default
default
no stress
40.000030.0000
under stress
20.000070.0000
Marks & Spencer
Retail
no default
default
no stress
80.000010.0000
under stress
30.000050.0000

Making inferences

Example 1 & 2

The first two examples consists in finding the chance of EDF to default with no evidence and knowing that the energy sector is not under stress. For that, we can use the getPosterior method:

In [5]:
gnb.sideBySide(gum.getPosterior(mn, target='EDF', evs={}),
               gum.getPosterior(mn, target='EDF', evs={'Energy':'no stress'}),
               captions=['$P(EDF)$','$P(EDF|Energy=$under stress$)$'])
EDF
no default
default
0.90720.0928

$P(EDF)$
EDF
no default
default
0.94740.0526

$P(EDF|Energy=$under stress$)$

Hence, if we cannot observe the state of the energy sector, then EDF has a higher chance of defaulting (9%) as it is possible that the energy sector is under stress.

Example 3 & 4

Given that Marks & Spencer (a much safer name) has defaulted, New Look has now an increased chance of defaulting: 97%.

Even if the retail sector is doing ok, New Look is a distressed name with a high probability of default (92%).

In [6]:
gnb.sideBySide(gum.getPosterior(mn, target='New Look', evs={'Marks & Spencer': 'default'}),
               gum.getPosterior(mn, target='New Look', evs={'Retail':'no stress'}),
               captions=['P(New Look|Marks & Spencer=default)',
                         'P(New Look | Retail = no stress)'])
New Look
no default
default
0.02860.9714

P(New Look|Marks & Spencer=default)
New Look
no default
default
0.07690.9231

P(New Look | Retail = no stress)

pyAgrum offers the option to vizualise the marginal probability of each variable using showInference.

In [7]:
#gum.config['factorgraph','edge_length_inference']=1.
gnb.showInference(mn, size='7!',nodeColor=nodetypes)
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_28_0.svg

Impact of defaults on a credit portfolio and the distribution of losses

Amongst the 9 available issuers, let’s consider the following portfolio containing bonds from these 6 issuers:

In [8]:
portfolio_names = ['EDF', 'Petrobras', 'EnQuest', 'Matalan', 'Barclays']
portfolio_exposures = [2e6, 400e3, 300e3, 600e3, 3e6]
LGD = 0.9
losses = {portfolio_names[i]:(portfolio_exposures[i]*LGD) for i in range(len(portfolio_names))}

print('** Total notional: USD {} MM'.format(sum(portfolio_exposures)/1e6))
** Total notional: USD 6.3 MM

We have a USD 6.3MM total exposure to these names (notional).

We assume that in case of default, we recover only 10% (LGD = 90%).

Let’s start using the model:

In [9]:
ie = gum.ShaferShenoyMRFInference(mn)
ie.makeInference()
p=ie.jointPosterior(set(sectors))

print(f'** There is {100*p[{"Retail":0,"Energy":0,"Finance":0}]:.2f}% chance that no sector in under stress.')
print()
p
** There is 42.38% chance that no sector in under stress.

Out[9]:
Retail
Energy
Finance
no stress
under stress
no stress
no stress
0.42380.2999
under stress
0.00830.1251
under stress
no stress
0.01050.1215
under stress
0.00000.0109

If we observe that the finance sector is doing ok, and that Marks & Spencer is still doing business as usual, then this is the current joint probabilities associated to the potential defaults in the portfolio:

In [10]:
ie = gum.ShaferShenoyMRFInference(mn)
ie.addJointTarget(set(portfolio_names))
ie.setEvidence({'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ie.makeInference()
portfolio_posterior = ie.jointPosterior(set(portfolio_names))

Note that the line ie.addJointTarget(set(portfolio_names)) is important here because adding hard evidences removes the corresponding nodes in the graph and then no factor containing the variables in portfolio_names can be found. The method addJointTarget ensure that such a factor exists.

In [11]:
portfolio_posterior
Out[11]:
Barclays
Matalan
EDF
EnQuest
Petrobras
no default
default
no default
no default
no default
no default
0.14950.0332
default
0.12680.0282
default
no default
0.03830.0085
default
0.04310.0096
default
no default
no default
0.00840.0019
default
0.00810.0018
default
no default
0.00260.0006
default
0.00770.0017
default
no default
no default
no default
0.15520.0345
default
0.13500.0300
default
no default
0.04120.0092
default
0.06270.0139
default
no default
no default
0.00880.0020
default
0.01020.0023
default
no default
0.00340.0008
default
0.01700.0038

We now want to compute the distribution of the number of defaults. For that, we create a new Markov random field containing an additional node Number of defaults connected to each issuers:

In [12]:
mn2 = gum.MarkovRandomField(mn)

mn2.add(gum.RangeVariable('Number of defaults', '', 0, len(portfolio_names)))
factor = mn2.addFactor(['Number of defaults',*portfolio_names]).fillWithFunction('+'.join(portfolio_names))
In [13]:
gum.config['factorgraph','edge_length']=1.1
nodetypes={n:0.1 if n in sectors else
             0.3 if "Number of defaults"==n else
             0.2 for n in mn2.names()}
gnb.flow.add(gnb.getMRF(mn2, view="graph",size='7!',nodeColor=nodetypes))
gnb.flow.add(gnb.getMRF(mn2, view="factorgraph",size='7!',nodeColor=nodetypes))
gnb.flow.display()
G Barclays Barclays Number of defaults Number of defaults Barclays--Number of defaults Matalan Matalan Barclays--Matalan Retail Retail Retail--Matalan New Look New Look Retail--New Look Marks & Spencer Marks & Spencer Retail--Marks & Spencer Petrobras Petrobras Petrobras--Barclays Petrobras--Number of defaults Petrobras--Matalan EnQuest EnQuest Petrobras--EnQuest Matalan--Number of defaults Finance Finance Finance--Barclays Finance--Retail Energy Energy Finance--Energy Deutsche Bank Deutsche Bank Finance--Deutsche Bank Metro Bank Metro Bank Finance--Metro Bank EDF EDF EDF--Barclays EDF--Petrobras EDF--Number of defaults EDF--Matalan EDF--EnQuest Energy--Retail Energy--Petrobras Energy--EDF Energy--EnQuest EnQuest--Barclays EnQuest--Number of defaults EnQuest--Matalan
G Barclays Barclays Retail Retail Petrobras Petrobras Number of defaults Number of defaults Matalan Matalan Finance Finance EDF EDF Energy Energy New Look New Look Marks & Spencer Marks & Spencer Deutsche Bank Deutsche Bank Metro Bank Metro Bank EnQuest EnQuest f1#5 f1#5--Energy f1#5--EnQuest f1#3 f1#3--EDF f1#3--Energy f3#4#5#8#10#12 f3#4#5#8#10#12--Barclays f3#4#5#8#10#12--Petrobras f3#4#5#8#10#12--Number of defaults f3#4#5#8#10#12--Matalan f3#4#5#8#10#12--EDF f3#4#5#8#10#12--EnQuest f2#11 f2#11--Retail f2#11--Marks & Spencer f2#9 f2#9--Retail f2#9--New Look f1#4 f1#4--Petrobras f1#4--Energy f0#7 f0#7--Finance f0#7--Deutsche Bank f1#2 f1#2--Retail f1#2--Energy f0#1 f0#1--Finance f0#1--Energy f2#10 f2#10--Retail f2#10--Matalan f0#8 f0#8--Barclays f0#8--Finance f0#6 f0#6--Finance f0#6--Metro Bank f0#2 f0#2--Retail f0#2--Finance

Once this new network is created, we can obtain the distribution of Number of defaults using getPosterior :

In [14]:
ndefault_posterior = gum.getPosterior(mn2, target='Number of defaults',
                                           evs={'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ndefault_posterior
Out[14]:
Number of defaults
0
1
2
3
4
5
0.14950.36200.31190.13710.03570.0038
In [15]:
fig, ax = plt.subplots()
l = ndefault_posterior.tolist()
ax.bar(range(len(l)), l)
ax.set_xlabel('Number of defaults')
ax.set_title('Distribution of the number of defaults in the portfolio')
plt.show()
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_45_0.svg
In [16]:
q=gum.Potential(ndefault_posterior).fillWith(range(ndefault_posterior.domainSize()))
print(f'Expected number of defaults in the portfolio: {(q*ndefault_posterior).sum():.1f}')
Expected number of defaults in the portfolio: 1.6

Besides the expected number of defaults, the distribution of losses is even more informative:

In [17]:
pot_losses = gum.Potential(portfolio_posterior)
for i in pot_losses.loopIn():
    pot_losses[i] = sum([losses[key]*i.val(key) for key in losses])
print("Table of losses")
pot_losses
Table of losses
Out[17]:
Barclays
Matalan
EDF
EnQuest
Petrobras
no default
default
no default
no default
no default
no default
0.00002700000.0000
default
360000.00003060000.0000
default
no default
270000.00002970000.0000
default
630000.00003330000.0000
default
no default
no default
1800000.00004500000.0000
default
2160000.00004860000.0000
default
no default
2070000.00004770000.0000
default
2430000.00005130000.0000
default
no default
no default
no default
540000.00003240000.0000
default
900000.00003600000.0000
default
no default
810000.00003510000.0000
default
1170000.00003870000.0000
default
no default
no default
2340000.00005040000.0000
default
2700000.00005400000.0000
default
no default
2610000.00005310000.0000
default
2970000.00005670000.0000
In [18]:
expected_loss = (portfolio_posterior * pot_losses ).sum()

print(f'Expected loss with respect to a par value of USD {sum(portfolio_exposures)/1e6:.1f} MM is USD {expected_loss / 1e6:.1f} MM.')
print(f'Expected loss: {100 * expected_loss / sum(portfolio_exposures):2.1f}% of the notional.')
Expected loss with respect to a par value of USD 6.3 MM is USD 1.2 MM.
Expected loss: 18.6% of the notional.
In [19]:
cuts=list(np.percentile([pot_losses[i] for i in pot_losses.loopIn()],range(0,101,10)))
cuts[0] = cuts[0] - 0.001

loss_bucket_distribution = []
d = gum.Potential(pot_losses)
for j in range(len(cuts) - 1):
    for i in pot_losses.loopIn():
        d[i] = 1 if cuts[j]<pot_losses[i]<=cuts[j+1] else 0
    loss_bucket_distribution.append(((portfolio_posterior*d).sum()))
In [20]:
buckets = [f'({cuts[i]:10.0f}, {cuts[i+1]:10.0f}]' for i in range(len(cuts)-1)]

fig, ax = plt.subplots()
ax.bar(range(10), height=loss_bucket_distribution)
ax.set_xticks(range(10), labels=buckets, rotation=90)
ax.set_title('Distribution of losses with respect to par value')
ax.set_xlabel('Loss buckets')
plt.show()
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_51_0.svg
In [21]:
cumsum = 0.
for i in range(len(loss_bucket_distribution)):
    cumsum += loss_bucket_distribution[i]
    print('{:24} | {:0.6f}'.format(buckets[i], cumsum))
(        -0,     549000] | 0.469960
(    549000,     954000] | 0.689263
(    954000,    2097000] | 0.762897
(   2097000,    2502000] | 0.787553
(   2502000,    2835000] | 0.834408
(   2835000,    3168000] | 0.888119
(   3168000,    3573000] | 0.941345
(   3573000,    4716000] | 0.987143
(   4716000,    5121000] | 0.991483
(   5121000,    5670000] | 1.000000

Impact of a belief of stress on a sector for the number of defaults in the portfolio

Belief from \([1,0]\) (no stress) to \([1,1]\) (no specific belief) to \([0,1]\) (stress in a sector).

In [22]:
def show_expected_number_of_defaults(sector,q,enod_nobelief):
  v1=[]
  for i in range(101):
     posterior=gum.getPosterior(mn2, target='Number of defaults',evs={sector:[100-i,i]})
     v1.append((posterior*q).sum())

  fig, ax = plt.subplots()
  ax.set_title(f"Expected number of defaults w.r.t stress in {sector}")
  ax.set_xlabel(sector, fontweight='bold')
  ax.set_ylabel('Number of defaults', style='italic')
  ax.plot(np.arange(0,1.01,0.01),v1)
  ax.plot(0.5,enod_nobelief,'x',color="red")
  ax.text(0.5, enod_nobelief-0.05, f' No specific evidence on {sector}')
  return fig

enod_nobelief=(gum.getPosterior(mn2, target='Number of defaults',evs={})*q).sum()

gnb.flow.add(show_expected_number_of_defaults('Retail',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Finance',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Energy',q,enod_nobelief))
gnb.flow.display()
In [ ]: