Naive modeling of credit defaults using a Markov Random Field

Creative Commons License

aGrUM

interactive online version

This notebook is the adaptation to pyAgrum of the model proposed by Gautier Marti which is itself inspired by the paper Graphical Models for Correlated Defaults (adaptation by Marvin Lasserre and Pierre-Henri Wuillemin).

In [1]:
import numpy as np
import matplotlib.pyplot as plt

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.mrf2graph as m2g

Constructing the model

Three sectors co-dependent in their stress state are modelled: (Finance, Energy, Retail). For each of these sectors, we consider a universe of three issuers.

Within Finance

Within Energy

Within Retail

Deutsche Bank

EDF

New Look

Metro Bank

Petrobras

Matalan

Barclays

EnQuest

Marks & Spencer

The probability of default of these issuers is partly idiosyncratic and partly depending on the stress within their respective sectors. Some distressed names such as New Look can have a high marginal probability of default, so high that the state of their sector (normal or stressed) does not matter much. Some other high-yield risky names such as Petrobras may strongly depend on how the whole energy sector is doing. On the other hand, a company such as EDF should be quite robust, and would require a very acute and persistent sector stress to move its default probability significantly higher.

We can encode this basic knowledge in terms of potentials:

image-2.png

Conventions:

Sector variables can be either in the state no stress or in the state under stress while issuer variables can be either in the state no default or in the state default.

The choice of these potentials is not an easy task; It can be driven by historical fundamental or statistical relationships or forward-looking views.

Given a network structure, and a set of potentials, one can mechanically do inference using the pyAgrum library.

We will show how to obtain a distribution of the number of defaults (and the expected number of defaults), a distribution of the losses (and the expected loss). From the joint probability table, we could also extract the default correlation

tl;dr We use here the library pyAgrum to illustrate, on a toy example, how one can use a Markov Random Field (MRF) for modelling the distribution of defaults and losses in a credit portfolio.

Building accurate PGM/MRF models require expert knowledge for considering a relevant graph structure, and attributing useful potentials. For this toy model, we use the following structure:

image.png

The first step consists in creating the model. For that, we use the pyAgrum class pyAgrum.MarkovRandomField.

In [2]:
# building nodes (with types)
mn = gum.MarkovRandomField('Credit default modeling')

# Adding sector variables
sectors = ['Finance', 'Energy', 'Retail']
finance, energy, retail = [mn.add(gum.LabelizedVariable(name, '', ['no stress', 'under stress']))
                           for name in sectors]

# Adding issuer variables
edf, petro, eq = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['EDF','Petrobras','EnQuest']]

mb, db, ba = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['Metro Bank', 'Deutsche Bank', 'Barclays']]

nl, matalan, ms = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
              for name in ['New Look', 'Matalan', 'Marks & Spencer']]
In [3]:
# Adding and filling factors between sectors
mn.addFactor([finance, energy])[:] = [[90, 70],
                                      [60, 10]]
mn.addFactor([finance, retail])[:] = [[80, 10],
                                      [30, 80]]
mn.addFactor([energy, retail])[:]  = [[60,  5],
                                      [70, 95]]

# Adding factors between sector and issuer
mn.addFactor([db, finance])[:] = [[90, 30],
                                  [10, 60]]
mn.addFactor([mb, finance])[:] = [[80, 40],
                                  [ 5, 60]]
mn.addFactor([ba, finance])[:] = [[90, 20],
                                  [20, 50]]

mn.addFactor([edf, energy])[:]   = [[90, 5],
                                    [80, 40]]
mn.addFactor([petro, energy])[:] = [[60, 50],
                                    [ 5, 60]]
mn.addFactor([eq, energy])[:]    = [[80, 20],
                                    [10, 50]]


mn.addFactor([nl, retail])[:]      = [[ 5, 60],
                                      [ 2, 90]]
mn.addFactor([matalan, retail])[:] = [[40, 30],
                                      [20, 70]]
mn.addFactor([ms, retail])[:]       = [[80, 10],
                                      [30, 50]]
In [4]:
nodetypes={n:0.1 if n in sectors else
             0.2 for n in mn.names()}

gnb.sideBySide(gnb.getMRF(mn,view='graph',nodeColor=nodetypes),
               gnb.getMRF(mn,view='factorgraph',nodeColor=nodetypes),
              captions=['The model as a Markov random field','The model as a factor graph'])

gnb.sideBySide(mn.factor({'Energy', 'Finance'}),
               mn.factor({'Finance', 'Retail'}),
               mn.factor({'Energy', 'Retail'}),
               mn.factor({'Deutsche Bank', 'Finance'}),
               mn.factor({'Metro Bank', 'Finance'}),
               mn.factor({'Barclays', 'Finance'}),
               mn.factor({'EDF', 'Energy'}),
               mn.factor({'Petrobras', 'Energy'}),
               mn.factor({'EnQuest', 'Energy'}),
               mn.factor({'New Look', 'Retail'}),
               mn.factor({'Matalan', 'Retail'}),
               mn.factor({'Marks & Spencer', 'Retail'}),
              ncols=3)
G EnQuest EnQuest Petrobras Petrobras New Look New Look Marks & Spencer Marks & Spencer Matalan Matalan Energy Energy Energy--EnQuest Energy--Petrobras Retail Retail Energy--Retail EDF EDF Energy--EDF Barclays Barclays Retail--New Look Retail--Marks & Spencer Retail--Matalan Finance Finance Finance--Energy Finance--Barclays Finance--Retail Metro Bank Metro Bank Finance--Metro Bank Deutsche Bank Deutsche Bank Finance--Deutsche Bank
The model as a Markov random field
G EnQuest EnQuest Petrobras Petrobras New Look New Look Marks & Spencer Marks & Spencer Matalan Matalan Energy Energy Barclays Barclays Retail Retail Finance Finance Metro Bank Metro Bank EDF EDF Deutsche Bank Deutsche Bank f0#1 f0#1--Energy f0#1--Finance f1#2 f1#2--Energy f1#2--Retail f0#7 f0#7--Finance f0#7--Deutsche Bank f1#4 f1#4--Petrobras f1#4--Energy f2#9 f2#9--New Look f2#9--Retail f2#11 f2#11--Marks & Spencer f2#11--Retail f0#2 f0#2--Retail f0#2--Finance f0#6 f0#6--Finance f0#6--Metro Bank f0#8 f0#8--Barclays f0#8--Finance f1#3 f1#3--Energy f1#3--EDF f1#5 f1#5--EnQuest f1#5--Energy f2#10 f2#10--Matalan f2#10--Retail
The model as a factor graph
Finance
Energy
no stress
under stress
no stress
90.000070.0000
under stress
60.000010.0000
Finance
Retail
no stress
under stress
no stress
80.000010.0000
under stress
30.000080.0000
Energy
Retail
no stress
under stress
no stress
60.00005.0000
under stress
70.000095.0000
Deutsche Bank
Finance
no default
default
no stress
90.000030.0000
under stress
10.000060.0000
Metro Bank
Finance
no default
default
no stress
80.000040.0000
under stress
5.000060.0000
Barclays
Finance
no default
default
no stress
90.000020.0000
under stress
20.000050.0000
EDF
Energy
no default
default
no stress
90.00005.0000
under stress
80.000040.0000
Petrobras
Energy
no default
default
no stress
60.000050.0000
under stress
5.000060.0000
EnQuest
Energy
no default
default
no stress
80.000020.0000
under stress
10.000050.0000
New Look
Retail
no default
default
no stress
5.000060.0000
under stress
2.000090.0000
Matalan
Retail
no default
default
no stress
40.000030.0000
under stress
20.000070.0000
Marks & Spencer
Retail
no default
default
no stress
80.000010.0000
under stress
30.000050.0000

Making inferences

Example 1 & 2

The first two examples consists in finding the chance of EDF to default with no evidence and knowing that the energy sector is not under stress. For that, we can use the getPosterior method:

In [5]:
gnb.sideBySide(gum.getPosterior(mn, target='EDF', evs={}),
               gum.getPosterior(mn, target='EDF', evs={'Energy':'no stress'}),
               captions=['$P(EDF)$','$P(EDF|Energy=$under stress$)$'])
EDF
no default
default
0.90720.0928

$P(EDF)$
EDF
no default
default
0.94740.0526

$P(EDF|Energy=$under stress$)$

Hence, if we cannot observe the state of the energy sector, then EDF has a higher chance of defaulting (9%) as it is possible that the energy sector is under stress.

Example 3 & 4

Given that Marks & Spencer (a much safer name) has defaulted, New Look has now an increased chance of defaulting: 97%.

Even if the retail sector is doing ok, New Look is a distressed name with a high probability of default (92%).

In [6]:
gnb.sideBySide(gum.getPosterior(mn, target='New Look', evs={'Marks & Spencer': 'default'}),
               gum.getPosterior(mn, target='New Look', evs={'Retail':'no stress'}),
               captions=['P(New Look|Marks & Spencer=default)',
                         'P(New Look | Retail = no stress)'])
New Look
no default
default
0.02860.9714

P(New Look|Marks & Spencer=default)
New Look
no default
default
0.07690.9231

P(New Look | Retail = no stress)

pyAgrum offers the option to vizualise the marginal probability of each variable using showInference.

In [7]:
#gum.config['factorgraph','edge_length_inference']=1.
gnb.showInference(mn, size='7!',nodeColor=nodetypes)
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_28_0.svg

Impact of defaults on a credit portfolio and the distribution of losses

Amongst the 9 available issuers, let’s consider the following portfolio containing bonds from these 6 issuers:

In [8]:
portfolio_names = ['EDF', 'Petrobras', 'EnQuest', 'Matalan', 'Barclays']
portfolio_exposures = [2e6, 400e3, 300e3, 600e3, 3e6]
LGD = 0.9
losses = {portfolio_names[i]:(portfolio_exposures[i]*LGD) for i in range(len(portfolio_names))}

print('** Total notional: USD {} MM'.format(sum(portfolio_exposures)/1e6))
** Total notional: USD 6.3 MM

We have a USD 6.3MM total exposure to these names (notional).

We assume that in case of default, we recover only 10% (LGD = 90%).

Let’s start using the model:

In [9]:
ie = gum.ShaferShenoyMRFInference(mn)
ie.makeInference()
p=ie.jointPosterior(set(sectors))

print(f'** There is {100*p[{"Retail":0,"Energy":0,"Finance":0}]:.2f}% chance that no sector in under stress.')
print()
p
** There is 42.38% chance that no sector in under stress.

Out[9]:
Retail
Finance
Energy
no stress
under stress
no stress
no stress
0.42380.2999
under stress
0.01050.1215
under stress
no stress
0.00830.1251
under stress
0.00000.0109

If we observe that the finance sector is doing ok, and that Marks & Spencer is still doing business as usual, then this is the current joint probabilities associated to the potential defaults in the portfolio:

In [10]:
ie = gum.ShaferShenoyMRFInference(mn)
ie.addJointTarget(set(portfolio_names))
ie.setEvidence({'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ie.makeInference()
portfolio_posterior = ie.jointPosterior(set(portfolio_names))

Note that the line ie.addJointTarget(set(portfolio_names)) is important here because adding hard evidences removes the corresponding nodes in the graph and then no factor containing the variables in portfolio_names can be found. The method addJointTarget ensure that such a factor exists.

In [11]:
portfolio_posterior
Out[11]:
Matalan
Barclays
EDF
Petrobras
EnQuest
no default
default
no default
no default
no default
no default
0.14950.1552
default
0.03830.0412
default
no default
0.12680.1350
default
0.04310.0627
default
no default
no default
0.00840.0088
default
0.00260.0034
default
no default
0.00810.0102
default
0.00770.0170
default
no default
no default
no default
0.03320.0345
default
0.00850.0092
default
no default
0.02820.0300
default
0.00960.0139
default
no default
no default
0.00190.0020
default
0.00060.0008
default
no default
0.00180.0023
default
0.00170.0038

We now want to compute the distribution of the number of defaults. For that, we create a new Markov random field containing an additional node Number of defaults connected to each issuers:

In [12]:
mn2 = gum.MarkovRandomField(mn)

mn2.add(gum.RangeVariable('Number of defaults', '', 0, len(portfolio_names)))
factor = mn2.addFactor(['Number of defaults',*portfolio_names]).fillFromFunction('+'.join(portfolio_names))
In [13]:
gum.config['factorgraph','edge_length']=1.1
nodetypes={n:0.1 if n in sectors else
             0.3 if "Number of defaults"==n else
             0.2 for n in mn2.names()}
gnb.flow.add(gnb.getMRF(mn2, view="graph",size='7!',nodeColor=nodetypes))
gnb.flow.add(gnb.getMRF(mn2, view="factorgraph",size='7!',nodeColor=nodetypes))
gnb.flow.display()
G EnQuest EnQuest Matalan Matalan EnQuest--Matalan Barclays Barclays EnQuest--Barclays Number of defaults Number of defaults EnQuest--Number of defaults Petrobras Petrobras Petrobras--EnQuest Petrobras--Matalan Petrobras--Barclays Petrobras--Number of defaults New Look New Look Marks & Spencer Marks & Spencer Matalan--Number of defaults Energy Energy Energy--EnQuest Energy--Petrobras Retail Retail Energy--Retail EDF EDF Energy--EDF Barclays--Matalan Barclays--Number of defaults Retail--New Look Retail--Marks & Spencer Retail--Matalan Finance Finance Finance--Energy Finance--Barclays Finance--Retail Metro Bank Metro Bank Finance--Metro Bank Deutsche Bank Deutsche Bank Finance--Deutsche Bank EDF--EnQuest EDF--Petrobras EDF--Matalan EDF--Barclays EDF--Number of defaults
G EnQuest EnQuest Petrobras Petrobras New Look New Look Marks & Spencer Marks & Spencer Matalan Matalan Energy Energy Barclays Barclays Retail Retail Finance Finance Number of defaults Number of defaults Metro Bank Metro Bank EDF EDF Deutsche Bank Deutsche Bank f1#5 f1#5--EnQuest f1#5--Energy f1#3 f1#3--Energy f1#3--EDF f3#4#5#8#10#12 f3#4#5#8#10#12--EnQuest f3#4#5#8#10#12--Petrobras f3#4#5#8#10#12--Matalan f3#4#5#8#10#12--Barclays f3#4#5#8#10#12--Number of defaults f3#4#5#8#10#12--EDF f2#11 f2#11--Marks & Spencer f2#11--Retail f2#9 f2#9--New Look f2#9--Retail f1#4 f1#4--Petrobras f1#4--Energy f0#7 f0#7--Finance f0#7--Deutsche Bank f1#2 f1#2--Energy f1#2--Retail f0#1 f0#1--Energy f0#1--Finance f2#10 f2#10--Matalan f2#10--Retail f0#8 f0#8--Barclays f0#8--Finance f0#6 f0#6--Finance f0#6--Metro Bank f0#2 f0#2--Retail f0#2--Finance

Once this new network is created, we can obtain the distribution of Number of defaults using getPosterior :

In [14]:
ndefault_posterior = gum.getPosterior(mn2, target='Number of defaults',
                                           evs={'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ndefault_posterior
Out[14]:
Number of defaults
0
1
2
3
4
5
0.14950.36200.31190.13710.03570.0038
In [15]:
fig, ax = plt.subplots()
l = ndefault_posterior.tolist()
ax.bar(range(len(l)), l)
ax.set_xlabel('Number of defaults')
ax.set_title('Distribution of the number of defaults in the portfolio')
plt.show()
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_45_0.svg
In [16]:
q=gum.Potential(ndefault_posterior).fillWith(range(ndefault_posterior.domainSize()))
print(f'Expected number of defaults in the portfolio: {(q*ndefault_posterior).sum():.1f}')
Expected number of defaults in the portfolio: 1.6

Besides the expected number of defaults, the distribution of losses is even more informative:

In [17]:
pot_losses = gum.Potential(portfolio_posterior)
for i in pot_losses.loopIn():
    pot_losses[i] = sum([losses[key]*i.val(key) for key in losses])
print("Table of losses")
pot_losses
Table of losses
Out[17]:
Matalan
Barclays
EDF
Petrobras
EnQuest
no default
default
no default
no default
no default
no default
0.0000540000.0000
default
270000.0000810000.0000
default
no default
360000.0000900000.0000
default
630000.00001170000.0000
default
no default
no default
1800000.00002340000.0000
default
2070000.00002610000.0000
default
no default
2160000.00002700000.0000
default
2430000.00002970000.0000
default
no default
no default
no default
2700000.00003240000.0000
default
2970000.00003510000.0000
default
no default
3060000.00003600000.0000
default
3330000.00003870000.0000
default
no default
no default
4500000.00005040000.0000
default
4770000.00005310000.0000
default
no default
4860000.00005400000.0000
default
5130000.00005670000.0000
In [18]:
expected_loss = (portfolio_posterior * pot_losses ).sum()

print(f'Expected loss with respect to a par value of USD {sum(portfolio_exposures)/1e6:.1f} MM is USD {expected_loss / 1e6:.1f} MM.')
print(f'Expected loss: {100 * expected_loss / sum(portfolio_exposures):2.1f}% of the notional.')
Expected loss with respect to a par value of USD 6.3 MM is USD 1.2 MM.
Expected loss: 18.6% of the notional.
In [19]:
cuts=list(np.percentile([pot_losses[i] for i in pot_losses.loopIn()],range(0,101,10)))
cuts[0] = cuts[0] - 0.001

loss_bucket_distribution = []
d = gum.Potential(pot_losses)
for j in range(len(cuts) - 1):
    for i in pot_losses.loopIn():
        d[i] = 1 if cuts[j]<pot_losses[i]<=cuts[j+1] else 0
    loss_bucket_distribution.append(((portfolio_posterior*d).sum()))
In [20]:
buckets = [f'({cuts[i]:>10_.0f}, {cuts[i+1]:>10_.0f}]' for i in range(len(cuts)-1)]

fig, ax = plt.subplots()
ax.bar(range(10), height=loss_bucket_distribution)
ax.set_xticks(range(10), labels=buckets, rotation=90)
ax.set_title('Distribution of losses with respect to par value')
ax.set_xlabel('Loss buckets')
plt.show()
../_images/notebooks_13-Examples_NaiveCreditDefaultModeling_51_0.svg
In [21]:
cumsum = 0.
for i in range(len(loss_bucket_distribution)):
    cumsum += loss_bucket_distribution[i]
    print('{:24} | {:0.6f}'.format(buckets[i], cumsum))
(        -0,    549_000] | 0.469960
(   549_000,    954_000] | 0.689263
(   954_000,  2_097_000] | 0.762897
( 2_097_000,  2_502_000] | 0.787553
( 2_502_000,  2_835_000] | 0.834408
( 2_835_000,  3_168_000] | 0.888119
( 3_168_000,  3_573_000] | 0.941345
( 3_573_000,  4_716_000] | 0.987143
( 4_716_000,  5_121_000] | 0.991483
( 5_121_000,  5_670_000] | 1.000000

Impact of a belief of stress on a sector for the number of defaults in the portfolio

Belief from \([1,0]\) (no stress) to \([1,1]\) (no specific belief) to \([0,1]\) (stress in a sector).

In [22]:
def show_expected_number_of_defaults(sector,q,enod_nobelief):
  v1=[]
  for i in range(101):
     posterior=gum.getPosterior(mn2, target='Number of defaults',evs={sector:[100-i,i]})
     v1.append((posterior*q).sum())

  fig, ax = plt.subplots()
  ax.set_title(f"Expected number of defaults w.r.t stress in {sector}")
  ax.set_xlabel(sector, fontweight='bold')
  ax.set_ylabel('Number of defaults', style='italic')
  ax.plot(np.arange(0,1.01,0.01),v1)
  ax.plot(0.5,enod_nobelief,'x',color="red")
  ax.text(0.5, enod_nobelief-0.05, f' No specific evidence on {sector}')
  return fig

enod_nobelief=(gum.getPosterior(mn2, target='Number of defaults',evs={})*q).sum()

gnb.flow.add(show_expected_number_of_defaults('Retail',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Finance',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Energy',q,enod_nobelief))
gnb.flow.display()