# Naive modeling of credit defaults using a Markov Random Field

This notebook is the adaptation to pyAgrum of the model proposed by Gautier Marti which is itself inspired by the paper Graphical Models for Correlated Defaults (adaptation by Marvin Lasserre and Pierre-Henri Wuillemin).

In [1]:

import numpy as np
import matplotlib.pyplot as plt

import pyAgrum as gum
import pyAgrum.lib.notebook as gnb
import pyAgrum.lib.mrf2graph as m2g


## Constructing the model

Three sectors co-dependent in their stress state are modelled: (Finance, Energy, Retail). For each of these sectors, we consider a universe of three issuers.

Within Finance

Within Energy

Within Retail

Deutsche Bank

EDF

New Look

Metro Bank

Petrobras

Matalan

Barclays

EnQuest

Marks & Spencer

The probability of default of these issuers is partly idiosyncratic and partly depending on the stress within their respective sectors. Some distressed names such as New Look can have a high marginal probability of default, so high that the state of their sector (normal or stressed) does not matter much. Some other high-yield risky names such as Petrobras may strongly depend on how the whole energy sector is doing. On the other hand, a company such as EDF should be quite robust, and would require a very acute and persistent sector stress to move its default probability significantly higher.

We can encode this basic knowledge in terms of potentials:

Conventions:

Sector variables can be either in the state no stress or in the state under stress while issuer variables can be either in the state no default or in the state default.

The choice of these potentials is not an easy task; It can be driven by historical fundamental or statistical relationships or forward-looking views.

Given a network structure, and a set of potentials, one can mechanically do inference using the pyAgrum library.

We will show how to obtain a distribution of the number of defaults (and the expected number of defaults), a distribution of the losses (and the expected loss). From the joint probability table, we could also extract the default correlation

tl;dr We use here the library pyAgrum to illustrate, on a toy example, how one can use a Markov Random Field (MRF) for modelling the distribution of defaults and losses in a credit portfolio.

Building accurate PGM/MRF models require expert knowledge for considering a relevant graph structure, and attributing useful potentials. For this toy model, we use the following structure:

The first step consists in creating the model. For that, we use the pyAgrum class pyAgrum.MarkovRandomField.

In [2]:

# building nodes (with types)
mn = gum.MarkovRandomField('Credit default modeling')

# Adding sector variables
sectors = ['Finance', 'Energy', 'Retail']
finance, energy, retail = [mn.add(gum.LabelizedVariable(name, '', ['no stress', 'under stress']))
for name in sectors]

# Adding issuer variables
edf, petro, eq = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
for name in ['EDF','Petrobras','EnQuest']]

mb, db, ba = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
for name in ['Metro Bank', 'Deutsche Bank', 'Barclays']]

nl, matalan, ms = [mn.add(gum.LabelizedVariable(name, '', ['no default', 'default']))
for name in ['New Look', 'Matalan', 'Marks & Spencer']]

In [3]:

# Adding and filling factors between sectors
mn.addFactor([finance, energy])[:] = [[90, 70],
[60, 10]]
mn.addFactor([finance, retail])[:] = [[80, 10],
[30, 80]]
mn.addFactor([energy, retail])[:]  = [[60,  5],
[70, 95]]

# Adding factors between sector and issuer
mn.addFactor([db, finance])[:] = [[90, 30],
[10, 60]]
mn.addFactor([mb, finance])[:] = [[80, 40],
[ 5, 60]]
mn.addFactor([ba, finance])[:] = [[90, 20],
[20, 50]]

mn.addFactor([edf, energy])[:]   = [[90, 5],
[80, 40]]
mn.addFactor([petro, energy])[:] = [[60, 50],
[ 5, 60]]
mn.addFactor([eq, energy])[:]    = [[80, 20],
[10, 50]]

mn.addFactor([nl, retail])[:]      = [[ 5, 60],
[ 2, 90]]
mn.addFactor([matalan, retail])[:] = [[40, 30],
[20, 70]]
mn.addFactor([ms, retail])[:]       = [[80, 10],
[30, 50]]

In [4]:

nodetypes={n:0.1 if n in sectors else
0.2 for n in mn.names()}

gnb.sideBySide(gnb.getMRF(mn,view='graph',nodeColor=nodetypes),
gnb.getMRF(mn,view='factorgraph',nodeColor=nodetypes),
captions=['The model as a Markov random field','The model as a factor graph'])

gnb.sideBySide(mn.factor({'Energy', 'Finance'}),
mn.factor({'Finance', 'Retail'}),
mn.factor({'Energy', 'Retail'}),
mn.factor({'Deutsche Bank', 'Finance'}),
mn.factor({'Metro Bank', 'Finance'}),
mn.factor({'Barclays', 'Finance'}),
mn.factor({'EDF', 'Energy'}),
mn.factor({'Petrobras', 'Energy'}),
mn.factor({'EnQuest', 'Energy'}),
mn.factor({'New Look', 'Retail'}),
mn.factor({'Matalan', 'Retail'}),
mn.factor({'Marks & Spencer', 'Retail'}),
ncols=3)

 G EnQuest EnQuest Barclays Barclays New Look New Look Metro Bank Metro Bank Deutsche Bank Deutsche Bank Energy Energy Energy--EnQuest Petrobras Petrobras Energy--Petrobras Retail Retail Energy--Retail EDF EDF Energy--EDF Retail--New Look Marks & Spencer Marks & Spencer Retail--Marks & Spencer Matalan Matalan Retail--Matalan Finance Finance Finance--Barclays Finance--Metro Bank Finance--Deutsche Bank Finance--Energy Finance--Retail The model as a Markov random field G EnQuest EnQuest Barclays Barclays New Look New Look Metro Bank Metro Bank Deutsche Bank Deutsche Bank Energy Energy Petrobras Petrobras Retail Retail Marks & Spencer Marks & Spencer Finance Finance Matalan Matalan EDF EDF f0#1 f0#1--Energy f0#1--Finance f1#2 f1#2--Energy f1#2--Retail f0#7 f0#7--Deutsche Bank f0#7--Finance f1#4 f1#4--Energy f1#4--Petrobras f2#9 f2#9--New Look f2#9--Retail f2#11 f2#11--Retail f2#11--Marks & Spencer f0#2 f0#2--Retail f0#2--Finance f0#6 f0#6--Metro Bank f0#6--Finance f0#8 f0#8--Barclays f0#8--Finance f1#3 f1#3--Energy f1#3--EDF f1#5 f1#5--EnQuest f1#5--Energy f2#10 f2#10--Retail f2#10--Matalan The model as a factor graph
Finance
Energy
no stress
under stress
no stress
90.000070.0000
under stress
60.000010.0000
Finance
Retail
no stress
under stress
no stress
80.000010.0000
under stress
30.000080.0000
Energy
Retail
no stress
under stress
no stress
60.00005.0000
under stress
70.000095.0000
Deutsche Bank
Finance
no default
default
no stress
90.000030.0000
under stress
10.000060.0000
Metro Bank
Finance
no default
default
no stress
80.000040.0000
under stress
5.000060.0000
Barclays
Finance
no default
default
no stress
90.000020.0000
under stress
20.000050.0000
EDF
Energy
no default
default
no stress
90.00005.0000
under stress
80.000040.0000
Petrobras
Energy
no default
default
no stress
60.000050.0000
under stress
5.000060.0000
EnQuest
Energy
no default
default
no stress
80.000020.0000
under stress
10.000050.0000
New Look
Retail
no default
default
no stress
5.000060.0000
under stress
2.000090.0000
Matalan
Retail
no default
default
no stress
40.000030.0000
under stress
20.000070.0000
Marks & Spencer
Retail
no default
default
no stress
80.000010.0000
under stress
30.000050.0000

## Example 1 & 2

The first two examples consists in finding the chance of EDF to default with no evidence and knowing that the energy sector is not under stress. For that, we can use the getPosterior method:

In [5]:

gnb.sideBySide(gum.getPosterior(mn, target='EDF', evs={}),
gum.getPosterior(mn, target='EDF', evs={'Energy':'no stress'}),
captions=['$P(EDF)$','$P(EDF|Energy=$under stress$)$'])

EDF
no default
default
0.90720.0928

$P(EDF)$
EDF
no default
default
0.94740.0526

$P(EDF|Energy=$under stress$)$

Hence, if we cannot observe the state of the energy sector, then EDF has a higher chance of defaulting (9%) as it is possible that the energy sector is under stress.

## Example 3 & 4

Given that Marks & Spencer (a much safer name) has defaulted, New Look has now an increased chance of defaulting: 97%.

Even if the retail sector is doing ok, New Look is a distressed name with a high probability of default (92%).

In [6]:

gnb.sideBySide(gum.getPosterior(mn, target='New Look', evs={'Marks & Spencer': 'default'}),
gum.getPosterior(mn, target='New Look', evs={'Retail':'no stress'}),
captions=['P(New Look|Marks & Spencer=default)',
'P(New Look | Retail = no stress)'])

New Look
no default
default
0.02860.9714

P(New Look|Marks & Spencer=default)
New Look
no default
default
0.07690.9231

P(New Look | Retail = no stress)

pyAgrum offers the option to vizualise the marginal probability of each variable using showInference.

In [7]:

#gum.config['factorgraph','edge_length_inference']=1.
gnb.showInference(mn, size='7!',nodeColor=nodetypes)


## Impact of defaults on a credit portfolio and the distribution of losses

Amongst the 9 available issuers, let’s consider the following portfolio containing bonds from these 6 issuers:

In [8]:

portfolio_names = ['EDF', 'Petrobras', 'EnQuest', 'Matalan', 'Barclays']
portfolio_exposures = [2e6, 400e3, 300e3, 600e3, 3e6]
LGD = 0.9
losses = {portfolio_names[i]:(portfolio_exposures[i]*LGD) for i in range(len(portfolio_names))}

print('** Total notional: USD {} MM'.format(sum(portfolio_exposures)/1e6))

** Total notional: USD 6.3 MM


We have a USD 6.3MM total exposure to these names (notional).

We assume that in case of default, we recover only 10% (LGD = 90%).

Let’s start using the model:

In [9]:

ie = gum.ShaferShenoyMRFInference(mn)
ie.makeInference()
p=ie.jointPosterior(set(sectors))

print(f'** There is {100*p[{"Retail":0,"Energy":0,"Finance":0}]:.2f}% chance that no sector in under stress.')
print()
p

** There is 42.38% chance that no sector in under stress.


Out[9]:

Retail
Finance
Energy
no stress
under stress
no stress
no stress
0.42380.2999
under stress
0.01050.1215
under stress
no stress
0.00830.1251
under stress
0.00000.0109

If we observe that the finance sector is doing ok, and that Marks & Spencer is still doing business as usual, then this is the current joint probabilities associated to the potential defaults in the portfolio:

In [10]:

ie = gum.ShaferShenoyMRFInference(mn)
ie.addJointTarget(set(portfolio_names))
ie.setEvidence({'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ie.makeInference()
portfolio_posterior = ie.jointPosterior(set(portfolio_names))


Note that the line ie.addJointTarget(set(portfolio_names)) is important here because adding hard evidences removes the corresponding nodes in the graph and then no factor containing the variables in portfolio_names can be found. The method addJointTarget ensure that such a factor exists.

In [11]:

portfolio_posterior

Out[11]:

Barclays
EnQuest
EDF
Petrobras
Matalan
no default
default
no default
no default
no default
no default
0.14950.0332
default
0.15520.0345
default
no default
0.12680.0282
default
0.13500.0300
default
no default
no default
0.00840.0019
default
0.00880.0020
default
no default
0.00810.0018
default
0.01020.0023
default
no default
no default
no default
0.03830.0085
default
0.04120.0092
default
no default
0.04310.0096
default
0.06270.0139
default
no default
no default
0.00260.0006
default
0.00340.0008
default
no default
0.00770.0017
default
0.01700.0038

We now want to compute the distribution of the number of defaults. For that, we create a new Markov random field containing an additional node Number of defaults connected to each issuers:

In [12]:

mn2 = gum.MarkovRandomField(mn)

mn2.add(gum.RangeVariable('Number of defaults', '', 0, len(portfolio_names)))
factor = mn2.addFactor(['Number of defaults',*portfolio_names]).fillFromFunction('+'.join(portfolio_names))

In [13]:

gum.config['factorgraph','edge_length']=1.1
nodetypes={n:0.1 if n in sectors else
0.3 if "Number of defaults"==n else
0.2 for n in mn2.names()}
gnb.flow.add(gnb.getMRF(mn2, view="graph",size='7!',nodeColor=nodetypes))
gnb.flow.add(gnb.getMRF(mn2, view="factorgraph",size='7!',nodeColor=nodetypes))
gnb.flow.display()


Once this new network is created, we can obtain the distribution of Number of defaults using getPosterior :

In [14]:

ndefault_posterior = gum.getPosterior(mn2, target='Number of defaults',
evs={'Finance': 'no stress', 'Marks & Spencer': 'no default'})
ndefault_posterior

Out[14]:

Number of defaults
0
1
2
3
4
5
0.14950.36200.31190.13710.03570.0038
In [15]:

fig, ax = plt.subplots()
l = ndefault_posterior.tolist()
ax.bar(range(len(l)), l)
ax.set_xlabel('Number of defaults')
ax.set_title('Distribution of the number of defaults in the portfolio')
plt.show()

In [16]:

q=gum.Potential(ndefault_posterior).fillWith(range(ndefault_posterior.domainSize()))
print(f'Expected number of defaults in the portfolio: {(q*ndefault_posterior).sum():.1f}')

Expected number of defaults in the portfolio: 1.6


Besides the expected number of defaults, the distribution of losses is even more informative:

In [17]:

pot_losses = gum.Potential(portfolio_posterior)
for i in pot_losses.loopIn():
pot_losses[i] = sum([losses[key]*i.val(key) for key in losses])
print("Table of losses")
pot_losses

Table of losses

Out[17]:

Barclays
EnQuest
EDF
Petrobras
Matalan
no default
default
no default
no default
no default
no default
0.00002700000.0000
default
540000.00003240000.0000
default
no default
360000.00003060000.0000
default
900000.00003600000.0000
default
no default
no default
1800000.00004500000.0000
default
2340000.00005040000.0000
default
no default
2160000.00004860000.0000
default
2700000.00005400000.0000
default
no default
no default
no default
270000.00002970000.0000
default
810000.00003510000.0000
default
no default
630000.00003330000.0000
default
1170000.00003870000.0000
default
no default
no default
2070000.00004770000.0000
default
2610000.00005310000.0000
default
no default
2430000.00005130000.0000
default
2970000.00005670000.0000
In [18]:

expected_loss = (portfolio_posterior * pot_losses ).sum()

print(f'Expected loss with respect to a par value of USD {sum(portfolio_exposures)/1e6:.1f} MM is USD {expected_loss / 1e6:.1f} MM.')
print(f'Expected loss: {100 * expected_loss / sum(portfolio_exposures):2.1f}% of the notional.')

Expected loss with respect to a par value of USD 6.3 MM is USD 1.2 MM.
Expected loss: 18.6% of the notional.

In [19]:

cuts=list(np.percentile([pot_losses[i] for i in pot_losses.loopIn()],range(0,101,10)))
cuts[0] = cuts[0] - 0.001

loss_bucket_distribution = []
d = gum.Potential(pot_losses)
for j in range(len(cuts) - 1):
for i in pot_losses.loopIn():
d[i] = 1 if cuts[j]<pot_losses[i]<=cuts[j+1] else 0
loss_bucket_distribution.append(((portfolio_posterior*d).sum()))

In [20]:

buckets = [f'({cuts[i]:>10_.0f}, {cuts[i+1]:>10_.0f}]' for i in range(len(cuts)-1)]

fig, ax = plt.subplots()
ax.bar(range(10), height=loss_bucket_distribution)
ax.set_xticks(range(10), labels=buckets, rotation=90)
ax.set_title('Distribution of losses with respect to par value')
ax.set_xlabel('Loss buckets')
plt.show()

In [21]:

cumsum = 0.
for i in range(len(loss_bucket_distribution)):
cumsum += loss_bucket_distribution[i]
print('{:24} | {:0.6f}'.format(buckets[i], cumsum))

(        -0,    549_000] | 0.469960
(   549_000,    954_000] | 0.689263
(   954_000,  2_097_000] | 0.762897
( 2_097_000,  2_502_000] | 0.787553
( 2_502_000,  2_835_000] | 0.834408
( 2_835_000,  3_168_000] | 0.888119
( 3_168_000,  3_573_000] | 0.941345
( 3_573_000,  4_716_000] | 0.987143
( 4_716_000,  5_121_000] | 0.991483
( 5_121_000,  5_670_000] | 1.000000


### Impact of a belief of stress on a sector for the number of defaults in the portfolio

Belief from $$[1,0]$$ (no stress) to $$[1,1]$$ (no specific belief) to $$[0,1]$$ (stress in a sector).

In [22]:

def show_expected_number_of_defaults(sector,q,enod_nobelief):
v1=[]
for i in range(101):
posterior=gum.getPosterior(mn2, target='Number of defaults',evs={sector:[100-i,i]})
v1.append((posterior*q).sum())

fig, ax = plt.subplots()
ax.set_title(f"Expected number of defaults w.r.t stress in {sector}")
ax.set_xlabel(sector, fontweight='bold')
ax.set_ylabel('Number of defaults', style='italic')
ax.plot(np.arange(0,1.01,0.01),v1)
ax.plot(0.5,enod_nobelief,'x',color="red")
ax.text(0.5, enod_nobelief-0.05, f' No specific evidence on {sector}')
return fig

enod_nobelief=(gum.getPosterior(mn2, target='Number of defaults',evs={})*q).sum()

gnb.flow.add(show_expected_number_of_defaults('Retail',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Finance',q,enod_nobelief))
gnb.flow.add(show_expected_number_of_defaults('Energy',q,enod_nobelief))
gnb.flow.display()

In [ ]:



In [ ]: