Abstract
The main aim of this thesis is to develop suitable and high performance Credit Scoring Models (CSMs) to assess credit risk of personal loans for the Sudanese commercial banks using data mining techniques. Two Sudanese credit datasets were constructed. These datasets were provided by Agricultural Bank of Sudan and Al Salam Commercial Bank. In addition to these two datasets, a German credit dataset was also employed in this research as a benchmarking dataset. Three data mining classification techniques were employed in this research: Artificial Neural Network (ANN), Support Vector Machine(SVM) and Decision Tree (DT). Genetic Algorithm (GA) is also applied as a feature selection technique. Two validation methods (split validation with two ratios (70:30 and 60:40) and 10-cross validation) were used to validate the proposed credit scoring models. As a result of combining GA with the specified classification techniques, tables of attributes and their weights were produced. By using these tables new reduced sets of features were identified for each dataset (i.e. new reduced datasets were produced from the original datasets). Experiments in this research were conducted in three stages. In stage 1, classification techniques were applied individually to each dataset .In stage 2, these techniques were combined with GA and in stage 3 these techniques were applied to the reduced datasets. Nine proposed credit scoring models for each dataset were developed for each stage. These models were compared for each dataset in terms of ii fiveevaluation measures: Accuracy, Precision (Defaulter), Precision (Nondefaulter), Type and Type П errors. As a result of these comparisons, the suggestions for the best models for each dataset were given. The experiments carried out in this research show that: • For all datasets, combining GA as a wrapper-feature selection technique with ANN, SVM and DT classification techniques is more beneficial than applying these techniques individually. Applying specified classification techniques to the reduced datasets does not bring a significant improvement to the major models in terms of the specified five measure indicators compared to the resulting models from applying these techniques to the original datasets.In addition, and as well-known fact the performance of each technique heavily depends on the nature of datasets.
Table of Contents
ABSTRACT............................................................................................................................... I
III .................................................................................................................. Dedications.............................................................................................................................................. v
Acknowledgements ................................................................................................................................ vi
List of Publications................................................................................................................................. vii
Table of Contents...................................................................................................................................viii
List of Figures.........................................................................................................................................xiv
List of Tables...........................................................................................................................................xv
List of Abbreviations............................................................................................................................xxiv
CHAPTER ONE....................................................................................................................... 1
1. Introduction......................................................................................................................................... 1
1.1 Overview........................................................................................................................................ 1
1.2 Motivation of the Research............................................................................................................. 2
1.3 Problem Statement......................................................................................................................... 3
1.4 Research Objectives........................................................................................................................ 4
1.5 Research Scope............................................................................................................................... 5
1.6 Contributions.................................................................................................................................. 5
1.7 Organization of the Thesis .............................................................................................................. 6
CHAPTER TWO..................................................................................................................... 9
2. Background.......................................................................................................................................... 9
2.1 Overview........................................................................................................................................ 9
2.2 General Concepts of Banking System .............................................................................................. 9
2.3 Challenges Facing Banks ............................................................................................................... 10
2.4 Risk Management in Banks .......................................................................................................... 11
2.4.1 Credit Risk ............................................................................................................................. 12
2.4.2 Credit-Risk Evaluation Systems .............................................................................................. 13
2.4.3 Similarities and Differences between Judgmental and Credit Scoring Systems........................ 14
2.4.4 Historical Background for Credit Scoring Method................................................................... 15
2.4.5 Credit Scoring Approach Definitions....................................................................................... 15
2.4.6 Benefits of Credit Scoring....................................................................................................... 16
2.4.7 Weaknesses of Credit Scoring ................................................................................................ 17
2.5 Data Mining Concepts................................................................................................................... 18
2.5.1 Data Mining Functionalities ................................................................................................... 21
2.5.2 Modeling Credit Scoring as a Classification Problem............................................................... 22
2.6 Data Mining Classification and prediction Techniques................................................................... 22
2.6.1 Classification and Numeric Prediction .................................................................................... 23
2.6.2 Definition of Classification ..................................................................................................... 23
2.6.3 Data Mining Classification and Prediction Techniques............................................................ 24
2.7 Summary...................................................................................................................................... 24
CHAPTER THREE................................................................................................................25
3. Literature Review............................................................................................................................... 25
3.1 Overview...................................................................................................................................... 25
3.2 Statistical approach ...................................................................................................................... 26
3.2.1 Linear Discriminant Analysis................................................................................................... 26
3.2.2 Logistic Regression ................................................................................................................ 27
3.2.3 Decision Tree......................................................................................................................... 28
3.3 Artificial Intelligence Approach ..................................................................................................... 31
3.3.1 Artificial Neural Network ....................................................................................................... 31
3.3.1.1 Limitations of Artificial Neural Networks......................................................................... 33
3.3.1.2 Efforts to Overcome Limitations ..................................................................................... 33
3.3.2 Support Vector Machine........................................................................................................ 36
3.3.2.1 SVM Parameters Optimization and Feature Selection...................................................... 38
3.3.2.2 Support Vector Machine Main Drawbacks ...................................................................... 38
3.3.3 Evolutionary Computational Techniques................................................................................ 40
3.3.4 Case–Based Reasoning........................................................................................................... 43
3.3.5 Rough Set.............................................................................................................................. 45
3.4 Hybrid Approach in Credit Scoring Models.................................................................................... 46
3.4.1 Hybrid Systems in Credit Scoring............................................................................................ 46
3.4.2 Ensemble Systems in Credit Scoring ....................................................................................... 49
3.5 Summary...................................................................................................................................... 51
CHAPTER FOUR..................................................................................................................57
4. Research Methodology ...................................................................................................................... 57
4.1 Overview...................................................................................................................................... 57
4.2 Phase 1: Problem Domain Identification........................................................................................ 59
4.2.1 Sudan’s Banking Sector.......................................................................................................... 59
4.2.1.1 Islamization of the Sudanese Banks and Islamic Financial Modes .................................... 59
4.2.2 Surveys and Interviews.......................................................................................................... 60
4.3 Phase Two: Literature survey ........................................................................................................ 62
4.4 Phase Three: Credit Datasets Construction.................................................................................... 62
4.4.1 Creation of Datasets.............................................................................................................. 63
4.4.2 Preprocessing of Datasets...................................................................................................... 63
4.5 Phase Four: Design of the Proposed Credit Scoring Models........................................................... 65
4.5.1 Building Credit Scoring Models Using Single Techniques......................................................... 65
4.5.2 Building Credit Scoring Models Using Hybrid Techniques........................................................ 65
4.5.3 Datasets Reduction................................................................................................................ 66
4.5.4 Building Credit Scoring Models Using Reduced Datasets......................................................... 66
4.6 Phase 5: Implementation.............................................................................................................. 66
4.7 Phase 6: Evaluation....................................................................................................................... 67
4.7.1 Identification of Measures Criteria......................................................................................... 67
4.7.2 Validation Results.................................................................................................................. 68
4.8 Summary...................................................................................................................................... 68
CHAPTER FIVE....................................................................................................................71
5. Data Collection, Datasets and Models Construction........................................................................... 71
5.1 Overview...................................................................................................................................... 71
5.2 Data Collection ............................................................................................................................. 71
5.2.1 Surveys and Interviews' Outcomes....................................................................................... 71
5.2.2 Structured Interviews’ Findings.............................................................................................. 72
5.2.3 Loan Granting Process Shortcomings in Sudanese Banks........................................................ 76
5.2.4 Readiness Factors for Credit Scoring ...................................................................................... 76
5.3 Datasets Construction and Description......................................................................................... 78
5.3.1 Datasets Construction............................................................................................................ 78
5.3.1.1 Sudanese Credit Dataset1............................................................................................... 78
5.3.1.2 Sudanese Credit Dataset2............................................................................................... 78
5.4 Datasets Construction................................................................................................................... 79
5.4.1 Identification of Data............................................................................................................. 79
5.4.2 Data Integration .................................................................................................................... 79
5.4.3 Missing Values Manipulation ................................................................................................. 79
5.4.4 Numerical Attributes Normalization....................................................................................... 80
5.4.5 Outliers Removing ................................................................................................................. 82
5.4.6 Transformation...................................................................................................................... 83
5.4.7 Instance Labeling................................................................................................................... 84
5.4.8 Age Attribute Creation........................................................................................................... 85
5.5 Datasets Description.................................................................................................................... 85
5.5.1 Description of the Sudanese Credit Dataset 1......................................................................... 85
5.5.2 Description of the Sudanese Credit Dataset 2......................................................................... 89
5.5.3 Description of the German credit dataset[14] ........................................................................ 92
5.6 Credit Scoring Models Construction .............................................................................................. 96
5.6.1 Software Package ...................................................................................................................... 96
5.6.2 Datasets .................................................................................................................................... 96
5.6.3 Validation Methods ................................................................................................................... 96
5.6.4 Sampling type............................................................................................................................ 97
5.6.5 Data Mining Classification Techniques ................................................................................... 98
5.6.5.1 Artificial Neural Network Parameters.............................................................................. 98
5.6.5.2 Support Vector Machine Parameters.............................................................................. 99
5.6.5.3 Decision Tree Parameters............................................................................................. 100
5.6.6 Feature Selection Techniques.............................................................................................. 101
5.6.7 Experiments Stages.............................................................................................................. 101
5.6.7.1 Stage1 Experiments...................................................................................................... 101
5.6.7.2 Stage 2 Experiments..................................................................................................... 123
5.6.7.3 Stage3 Experiments...................................................................................................... 162
5.7 Summary.................................................................................................................................... 183
CHAPTER SIX.................................................................................................................... 184
6. Results and Discussion ..................................................................................................................... 184
6.1 Overview.................................................................................................................................... 184
6.2 Evaluation measures................................................................................................................... 184
6.3 General Characteristics of the Datasets....................................................................................... 185
6.4 Comparisons and Discussion of Results for Proposed CSMs......................................................... 187
6.4.1 Comparisons and Discussion of Stage 1 Resulting Models .................................................... 187
6.4.1.1 Comparisons and Discussion of the SCD1 Stage 1 Resulting Models............................... 187
6.4.1.2 Comparisons and Discussion of the SCD2 Stage 1 Resulting Models............................... 189
6.4.1.3 Comparisons and Discussion of the German Stage 1 Resulting Models.......................... 190
6.4.2 Comparisons and Discussion of the Stage 2 Resulting Models............................................. 192
6.4.2.1 Comparisons and Discussion of the SCD1 Stage 2 Resulting Models............................... 192
6.4.2.2 Comparisons and Discussion of the SCD2 Stage 2 Resulting Models.............................. 193
6.4.2.3 Comparisons and Discussion of the German Dataset Stage 2 Resulting Models ............. 194
6.4.3 Results of Comparisons for Stage 3 Models.......................................................................... 195
6.4.3.1 Results of Comparisons for SCD1 Stage 3 Models.......................................................... 195
6.4.3.2 Results of Comparisons for SCD2 Stage 3 Models.......................................................... 196
6.4.3.3 Results of Comparisons for the German Dataset Stage 3 Models................................. 197
6.4.4 Comparisons between Stages 1, 2 and 3 Experiments Resulting Models and Discussion....... 198
6.4.4.1 Comparisons between Stages 1, 2 and 3 Experiments Resulting Models and Discussion for SCD1......................................... 198
6.4.4.2 Comparisons between Stages 1, 2 and 3 Experiments Resulting Models and Discussion for SCD2................................................ 207
6.4.4.3 Comparisons between Stages 1, 2 and 3 Experiments Resulting Models and Discussion for the German Dataset.................................. 216
6.5 Summary.................................................................................................................................... 225
CHAPTER SEVEN ............................................................................................................. 227
7. Conclusions and Recommendations................................................................................................. 227
7.1 Conclusions................................................................................................................................ 227
7.1.1 Summary of the Thesis......................................................................................................... 227
7.1.2 Findings of the Thesis .......................................................................................................... 231
7.2 Recommendations for Future Research ...................................................................................... 232
References........................................................................................................................................... 235
APPENDIX A...................................................................................................................... 242
Islamic Financing Modes...................................................................................................................... 242
APPENDIX B...................................................................................................................... 244
Parts of Sudanese Credit Datasets ....................................................................................................... 244
Consults, E. & El, E (2022). Credit Scoring Using Data Mining Classification Application on Sudanese Banks. Afribary. Retrieved from https://tracking.afribary.com/works/credit-scoring-using-data-mining-classification-application-on-sudanese-banks
Consults, Education, and Eiman El "Credit Scoring Using Data Mining Classification Application on Sudanese Banks" Afribary. Afribary, 07 Nov. 2022, https://tracking.afribary.com/works/credit-scoring-using-data-mining-classification-application-on-sudanese-banks. Accessed 09 Nov. 2024.
Consults, Education, and Eiman El . "Credit Scoring Using Data Mining Classification Application on Sudanese Banks". Afribary, Afribary, 07 Nov. 2022. Web. 09 Nov. 2024. < https://tracking.afribary.com/works/credit-scoring-using-data-mining-classification-application-on-sudanese-banks >.
Consults, Education and El, Eiman . "Credit Scoring Using Data Mining Classification Application on Sudanese Banks" Afribary (2022). Accessed November 09, 2024. https://tracking.afribary.com/works/credit-scoring-using-data-mining-classification-application-on-sudanese-banks