Credit Risk Prediction: An Application of Federated Learning
Subject Areas : Machine learning
Sara Houshmand
1
*
,
Amir albadvi
2
1 - Department of Industrial Engineering, Faculty of Engineering, Tarbiat Modares University, Tehran, Iran
2 - Department of Industrial Engineering, Faculty of Engineering, Tarbiat Modares University, Tehran, Iran
Keywords:
Abstract :
Credit risk is one of the major challenges faced by all financial institutions. Different institutions apply various techniques and models to reduce the risks associated with lending and other financial activities. However, due to the sensitivity of financial data and the diversity of modeling approaches, sharing data among institutions is extremely difficult, often impossible. As a result, improvements in credit risk prediction models typically occur in isolation, hindering collective progress toward higher accuracy and broader effectiveness. Federated learning offers a promising solution by allowing institutions to collaboratively train models without exposing or transferring sensitive data. In this research, we present a federated learning architecture for credit risk prediction that ensures privacy throughout the entire training process. Our results indicate that this approach not only protects data confidentiality but also maintains high predictive accuracy over numerous training rounds, offering a reliable and efficient framework for institutional adoption. The core contribution of this work is the development of a decentralized federated learning (FL) architecture tailored to heterogeneous, non-IID financial data. This framework enhances privacy, scalability, and regulatory compliance, and demonstrates performance advantages over traditional methods. In this article, we demonstrate that using five real-world credit risk datasets, the decentralized FL architecture significantly improves model accuracy (ranging from 71% to 99%) compared to traditional machine learning methods, especially in scenarios where privacy and communication efficiency are essential. While centralized FL achieves the highest average accuracy (up to 83%), the decentralized model provides a strong trade-off between performance and privacy-aware collaboration.