Machine learning in risk management

Machine learning in risk management

Machine learning (ML) models have already been around for decades. The exponential growth in computing power and data availability, however, has resulted in many new opportunities for ML models. One possible application is to use them in financial institutions’ risk management. This article gives a brief introduction of ML models, followed by the most promising opportunities for using ML models in financial risk management.

The current trend to operate a ‘data-driven business’ and the fact that regulators are increasingly focused on data quality and data availability, could give an extra impulse to the use of ML models.

ML models

ML models study a dataset and use the knowledge gained to make predictions for other datapoints. An ML model consists of an ML algorithm and one or more hyperparameters. ML algorithms study a dataset to make predictions, where hyperparameters determine the settings of the ML algorithm. The studying of a dataset is known as the training of the ML algorithm. Most ML algorithms have hyperparameters that need to be set by the user prior to the training. The trained algorithm, together with the calibrated set of hyperparameters, form the ML model.

ML models have different forms and shapes, and even more purposes. For selecting an appropriate ML model, a deeper understanding of the various types of ML that are available and how they work is required. Three types of ML can be distinguished:

  • Supervised learning.
  • Unsupervised learning.
  • Semi-supervised learning.

The main difference between these types is the data that is required and the purpose of the model. The data that is fed into an ML model is split into two categories: the features (independent variables) and the labels/targets (dependent variables, for example, to predict a person’s height – label/target – it could be useful to look at the features: age, sex, and weight). Some types of machine learning models need both as an input, while others only require features. Each of the three types of machine learning is shortly introduced below.

Supervised learning

Supervised learning is the training of an ML algorithm on a dataset where both the features and the labels are available. The ML algorithm uses the features and the labels as an input to map the connection between features and labels. When the model is trained, labels can be generated by the model by only providing the features. A mapping function is used to provide the label belonging to the features. The performance of the model is assessed by comparing the label that the model provides with the actual label.

Unsupervised learning

In unsupervised learning there is no dependent variable (or label) in the dataset. Unsupervised ML algorithms search for patterns within a dataset. The algorithm links certain observations to others by looking at similar features. This makes an unsupervised learning algorithm suitable for, among other tasks, clustering (i.e. the task of dividing a dataset into subsets). This is done in such a manner that an observation within a group is more like other observations within the subset than an observation that is not in the same group. A disadvantage of unsupervised learning is that the model is (often) a black box.

Semi-supervised learning

Semi-supervised learning uses a combination of labeled and unlabeled data. It is common that the dataset used for semi-supervised learning consist of mostly unlabeled data. Manually labeling all the data within a dataset can be very time consuming and semi-supervised learning offers a solution for this problem. With semi-supervised learning a small, labeled subset is used to make a better prediction for the complete data set.

The training of a semi-supervised learning algorithm consists of two steps. To label the unlabeled observations from the original dataset, the complete set is first clustered using unsupervised learning. The clusters that are formed are then labeled by the algorithm, based on their originally labeled parts. The resulting fully labeled data set is used to train a supervised ML algorithm. The downside of semi-supervised learning is that it is not certain the labels are 100% correct.

Setting up the model

In most ML implementations, the data gathering, integration and pre-processing usually takes more time than the actual training of the algorithm. It is an iterative process of training a model, evaluating the results, modifying hyperparameters and repeating, rather than just a single process of data preparation and training. After the training is performed and the hyperparameters have been calibrated, the ML model is ready to make predictions.

Machine learning in financial risk management

ML can add value to financial risk management applications, but the type of model should suit the problem and the available data. For some applications, like challenger models, it is not required to completely explain the model you are using. This makes, for example, an unsupervised black box model suitable as a challenger model. In other cases, explainability of model results is a critical condition while choosing an ML model. Here, it might not be suitable to use a black box model.

In the next section we present some examples where ML models can be of added value in financial risk management.

Data quality analysis

All modeling challenges start with data. In line with the ‘garbage in, garbage out’ maxim, if the quality of a dataset is insufficient then an ML model will also not perform well. It is quite common that during the development of an ML model, a lot of time is spent on improving the data quality. As ML algorithms learn directly from the data, the performance of the resulting model will increase if the data quality increases. ML can be used to improve data quality before this data is used for modeling. For example, the data quality can be improved by removing/replacing outliers and replacing missing values with likely alternatives.

An example of insufficient data quality is the presence of large or numerous outliers. An outlier is an observation that significantly deviates from the other observations in the data, which might indicate it is incorrect. Outlier detection can easily be performed by a data scientist for univariate outliers, but multivariate outliers are a lot harder to identify. When outliers have been detected, or if there are missing values in a dataset, it might be useful to substitute some of these outliers or impute for missing values. Popular imputation methods are the mean, median or most frequent methods. Another option is to look for more suitable values; and ML techniques could help to improve the data quality here.

Multiple ML models can be combined to improve data quality. First, an ML model can be used to detect outliers, then another model can be used to impute missing data or substitute outliers by a more likely value. The outlier detection can either be done using clustering algorithms or by specialized outlier detection techniques.

Loan approval

A banks core business is lending money to consumers and companies. The biggest risk for a bank is the credit risk that a borrower will not be able to fully repay the borrowed amount. Adequate loan approval can minimize this credit risk. To determine whether a bank should provide a loan, it is important to estimate the probability of default for that new loan application.

Established banks already have an extensive record of loans and defaults at their disposal. Together with contract details, this can form a valuable basis for an ML-based loan approval model. Here, the contract characteristics are the features, and the label is the variable indicating if the consumer/company defaulted or not. The features could be extended with other sources of information regarding the borrower.

Supervised learning algorithms can be used to classify the application of the potential borrower as either approved or rejected, based on their probability of a future default on the loan. One of the suitable ML model types would be classification algorithms, which split the dataset into either the default’ or non-default’ category, based on their features.

Challenger models

When there is already a model in place, it can be helpful to challenge this model. The model in use can be compared to a challenger model to evaluate differences in performance. Furthermore, the challenger model can identify possible effects in the data that are not captured yet in the model in use. Such analysis can be performed as a review of the model in use or before taking the model into production as a part of a model validation.

The aim of a challenger model is to challenge the model in use. As it is usually not feasible to design another sophisticated model, mostly simpler models are selected as challenger model. ML models can be useful to create more advanced challenger models within a relatively limited amount of time.

Challenger models do not necessarily have to be explainable, as they will not be used in practice, but only as a comparison for the model in use. This makes all ML models suitable as challenger models, even black box models such as neural networks.


Segmentation concerns dividing a full data set into subsets based on certain characteristics. These subsets are also referred to as segments. Often segmentation is performed to create a model per segment to better capture the segments specific behavior. Creating a model per segment can lower the error of the estimations and increase the overall model accuracy, compared to a single model for all segments combined.

Segmentation can, among other uses, be applied in credit rating models, prepayment models and marketing. For these purposes, segmentation is sometimes based on expert judgement and not on a data-driven model. ML models could help to change this and provide quantitative evidence for a segmentation.

There are two approaches in which ML models can be used to create a data-driven segmentation.  One approach is that observations can be placed into a certain segment with similar observations based on their features, for example by applying a clustering or classification algorithm. Another approach to segment observations is to evaluate the output of a target variable or label. This approach assumes that observations in the same segment have the same kind of behavior regarding this target variable or label.

In the latter approach, creating a segment itself is not the goal, but optimizing the estimation of the target variable or classifying the right label is. For example, all clients in a segment Acould be modeled by function a, where clients in segment Bwould be modeled by function b. Functions aand bcould be regression models based on the features of the individual clients and/or macro variables that give a prediction for the actual target variable.

Credit scoring

Companies and/or debt instruments can receive a credit rating from a credit rating agency. There are a few well-known rating agencies providing these credit ratings, which reflects their assessment of the probability of default of the company or debt instrument. Besides these rating agencies, financial institutions also use internal credit scoring models to determine a credit score. Credit scores also provide an expectation on the creditworthiness of a company, debt instrument or individual.

Supervised ML models are suitable for credit scoring, as the training of the ML model can be done on historical data. For historical data, the label (defaulted’ or not defaulted) can be observed and extensive financial data (the features) is mostly available. Supervised ML models can be used to determine reliable credit scores in a transparent way as an alternative to traditional credit scoring models. Alternatively, credit scoring models based on ML can also act as challenger models for traditional credit scoring models. In this case, explainability is not a key requirement for the selected ML model.


ML can add value to, or replace, models applied in financial risk management. It can be used in many different model types and in many different manners. A few examples have been provided in this article, but there are many more.

ML models learn directly from the data, but there are still some choices to be made by the model user. The user can select the model type and must determine how to calibrate the hyperparameters. There is no one size fits all’ solution to calibrate a ML model. Therefore, ML is sometimes referred to as an art, rather than a science.

When applying ML models, one should always be careful and understand what is happening under the hood. As with all modeling activities, every method has its pitfalls. Most ML models will come up with a solution, even if it is suboptimal. Common sense is always required when modeling. In the right hands though, ML can be a powerful tool to improve modeling in financial risk management.

Working with ML models has given us valuable insights (see the box below). Every application of ML led to valuable lessons on what to expect from ML models, when to use them and what the pitfalls are.

Machine learning and Zanders

Zanders already encountered several projects and research questions where ML could be applied. In some cases, the use of ML was indeed beneficial; in other cases, traditional models turned out to be the better solution.

During these projects, most time was spent on data collection and data pre-processing. Based on these experiences, an ML based dataset validation tool was developed. In another case, a model was adapted to handle missing data by using an alternative available feature of the observation.

ML was also used to challenge a Zanders internal credit rating model. This resulted in useful insights on potential model improvements. For example, the ML model provided more insight in variable importance and segmentation. These insights are useful for the further development of Zanders’ credit rating models. Besides the insights what could be done better, the ML model also emphasized the advantages of classical models over the ML-based versions. The ML model was not able to provide more sensible ratings than the traditional credit rating model.

In another case, we investigated whether it would be sensible and feasible to use ML for transaction screening and anomaly detection. The outcome of this project once more highlighted that data is key for ML models. The available data was numerous, but of low quality. Therefore, the used ML models were not able to provide a helpful insight into the payments, or to consistently detect divergent payment behavior on a large scale.

Besides the projects where ML was used to deliver a solution, we investigated the explainability of several ML models. During this process we gained knowledge on techniques to provide more insights into otherwise hardly understandable (black box) models.