• Home
  • Feature Selection
  • OpenAccess
    • List of Articles Feature Selection

      • Open Access Article

        1 - Handwritten Digits Recognition Using an Ensemble Technique Based on the Firefly Algorithm
        Azar Mahmoodzadeh Hamed Agahi Marzieh  Salehi
        This paper develops a multi-step procedure for classifying Farsi handwritten digits using a combination of classifiers. Generally, the technique relies on extracting a set of characteristics from handwritten samples, training multiple classifiers to learn to discriminat More
        This paper develops a multi-step procedure for classifying Farsi handwritten digits using a combination of classifiers. Generally, the technique relies on extracting a set of characteristics from handwritten samples, training multiple classifiers to learn to discriminate between digits, and finally combining the classifiers to enhance the overall system performance. First, a pre-processing course is performed to prepare the images for the main steps. Then three structural and statistical characteristics are extracted which include several features, among which a multi-objective genetic algorithm selects those more effective ones in order to reduce the computational complexity of the classification step. For the base classification, a decision tree (DT), an artificial neural networks (ANN) and a k-nearest neighbor (KNN) models are employed. Finally, the outcomes of the classifiers are fed into a classifier ensemble system to make the final decision. This hybrid system assigns different weights for each class selected by each classifier. These voting weights are adjusted by a metaheuristic firefly algorithm which optimizes the accuracy of the overall system. The performance of the implemented approach on the standard HODA dataset is compared with the base classifiers and some state-of-the-art methods. Evaluation of the proposed technique demonstrates that the proposed hybrid system attains high performance indices including accuracy of 98.88% with only eleven features. Manuscript profile
      • Open Access Article

        2 - Graph Based Feature Selection Using Symmetrical Uncertainty in Microarray Dataset
        Soodeh Bakhshandeh azmi azmi Mohammad Teshnehlab
        Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray More
        Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray data as well as increasing the prediction performance. In this paper, a new gene selection method is proposed based on community detection technique and ranking the best genes. Symmetric Uncertainty is used for selection of the best genes by calculation of similarity between two genes and between each gene and class label which leads to representation of search space as a graph, in the first step. Afterwards, the proposed graph is divided into several clusters using community detection algorithm and finally, after ranking the genes, the genes with maximum ranks are selected as the best genes. This approach is a supervised/unsupervised filter-based gene selection method that minimizes the redundancy between genes and maximizes the relevance of genes and class label. Performance of the proposed method is compared with thirteen well-known unsupervised/supervised gene selection approaches over six microarray datasets using four classifiers including SVM, DT, NB and k-NN. Results show the advantages of the proposed approach. Manuscript profile
      • Open Access Article

        3 - An Effective Method of Feature Selection in Persian Text for Improving the Accuracy of Detecting Request in Persian Messages on Telegram
        zahra khalifeh zadeh Mohammad Ali Zare Chahooki
        In recent years, data received from social media has increased exponentially. They have become valuable sources of information for many analysts and businesses to expand their business. Automatic document classification is an essential step in extracting knowledge from More
        In recent years, data received from social media has increased exponentially. They have become valuable sources of information for many analysts and businesses to expand their business. Automatic document classification is an essential step in extracting knowledge from these sources of information. In automatic text classification, words are assessed as a set of features. Selecting useful features from each text reduces the size of the feature vector and improves classification performance. Many algorithms have been applied for the automatic classification of text. Although all the methods proposed for other languages are applicable and comparable, studies on classification and feature selection in the Persian text have not been sufficiently carried out. The present research is conducted in Persian, and the introduction of a Persian dataset is a part of its innovation. In the present article, an innovative approach is presented to improve the performance of Persian text classification. The authors extracted 85,000 Persian messages from the Idekav-system, which is a Telegram search engine. The new idea presented in this paper to process and classify this textual data is on the basis of the feature vector expansion by adding some selective features using the most extensively used feature selection methods based on Local and Global filters. The new feature vector is then filtered by applying the secondary feature selection. The secondary feature selection phase selects more appropriate features among those added from the first step to enhance the effect of applying wrapper methods on classification performance. In the third step, the combined filter-based methods and the combination of the results of different learning algorithms have been used to achieve higher accuracy. At the end of the three selection stages, a method was proposed that increased accuracy up to 0.945 and reduced training time and calculations in the Persian dataset. Manuscript profile
      • Open Access Article

        4 - Application of Machine Learning in the Telecommunications Industry: Partial Churn Prediction by using a Hybrid Feature Selection Approach
        Fatemeh Mozaffari Iman Raeesi Vanani Payam Mahmoudian Babak Sohrabi
        The telecommunications industry is one of the most competitive industries in the world. Because of the high cost of customer acquisition and the adverse effects of customer churn on the company's performance, customer retention becomes an inseparable part of strategic d More
        The telecommunications industry is one of the most competitive industries in the world. Because of the high cost of customer acquisition and the adverse effects of customer churn on the company's performance, customer retention becomes an inseparable part of strategic decision-making and one of the main objectives of customer relationship management. Although customer churn prediction models are widely studied in various domains, several challenges remain in designing and implementing an effective model. This paper addresses the customer churn prediction problem with a practical approach. The experimental analysis was conducted on the customers' data gathered from available sources at a telecom company in Iran. First, partial churn was defined in a new way that exploits the status of customers based on criteria that can be measured easily in the telecommunications industry. This definition is also based on data mining techniques that can find the degree of similarity between assorted customers with active ones or churners. Moreover, a hybrid feature selection approach was proposed in which various feature selection methods, along with the crowd's wisdom, were applied. It was found that the wisdom of the crowd can be used as a useful feature selection method. Finally, a predictive model was developed using advanced machine learning algorithms such as bagging, boosting, stacking, and deep learning. The partial customer churn was predicted with more than 88% accuracy by the Gradient Boosting Machine algorithm by using 5-fold cross-validation. Comparative results indicate that the proposed model performs efficiently compared to the ones applied in the previous studies. Manuscript profile