Extracting Credit Rules from Imbalanced Data: The Case of an Iranian Export Development Bank

Sadatrasoul, Seyed Mahdi; gholamian, mohammadreza; shahanaghi, Kamran

doi:10.7508/jist.2015.01.004

Manuscript ID : 139308061126222815 Visit : 17444 Page: 1 - 10

10.7508/jist.2015.01.004

Article Type: Original Research

Extracting Credit Rules from Imbalanced Data: The Case of an Iranian Export Development Bank

Subject Areas : Data Mining

Seyed Mahdi Sadatrasoul ^{1
*} , mohammadreza gholamian ² , Kamran shahanaghi ³

1 - Department of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran
2 - Department of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran
3 - Department of Industrial Engineering, Iran University of Science and Technology, Tehran, Iran

Received: 2014-10-28 Accepted : 2014-10-28 Published : 2015-03-24

Keywords: Credit Scoring, Banking Industry, Rule Extraction, Imbalanced Data, Sampling,

Abstract :

Credit scoring is an important topic, and banks collect different data from their loan applicant to make an appropriate and correct decision. Rule bases are of more attention in credit decision making because of their ability to explicitly distinguish between good and bad applicants. The credit scoring datasets are usually imbalanced. This is mainly because the number of good applicants in a portfolio of loan is usually much higher than the number of loans that default. This paper use previous applied rule bases in credit scoring, including RIPPER, OneR, Decision table, PART and C4.5 to study the reliability and results of sampling on its own dataset. A real database of one of an Iranian export development bank is used and, imbalanced data issues are investigated by randomly Oversampling the minority class of defaulters, and three times under sampling of majority of non-defaulters class. The performance criterion chosen to measure the reliability of rule extractors is the area under the receiver operating characteristic curve (AUC), accuracy and number of rules. Friedman’s statistic is used to test for significance differences between techniques and datasets. The results from study show that PART is better and good and bad samples of data affect its results less.

References:

FLHB-AC: Federated Learning History-Based Access Control Using Deep Neural Networks in Healthcare System
Print Date : 2024-06-24
Optimization of Query Processing in Versatile Database Using Ant Colony Algorithm
Print Date : 2024-03-18
Proposing an FCM-MCOA Clustering Approach Stacked with Convolutional Neural Networks for Analysis of Customers in Insurance Company
Print Date : 2024-03-18
Representing a Novel Expanded Version of Shor’s Algorithm and a Real-Time Experiment using IBM Q-Experience Platform
Print Date : 2023-06-10
Computational Model for Image Processing in the Minds of People with Visual Agnosia using Fuzzy Cognitive Map
Print Date : 2023-06-10
Developing A Contextual Combinational Approach for Predictive Analysis of Users Mobile Phone Trajectory Data in LBSNs
Print Date : 2023-01-10

Share To

Article Url

Extracting Credit Rules from Imbalanced Data: The Case of an Iranian Export Development Bank