List of subject articles Data Mining


    • Open Access Article

      1 - Representing a Content-based link Prediction Algorithm in Scientific Social Networks
      hosna solaimannezhad omid fatemi
      Predicting collaboration between two authors, using their research interests, is one of the important issues that could improve the group researches. One type of social networks is the co-authorship network that is one of the most widely used data sets for studying. A Full Text
      Predicting collaboration between two authors, using their research interests, is one of the important issues that could improve the group researches. One type of social networks is the co-authorship network that is one of the most widely used data sets for studying. As a part of recent improvements of research, far much attention is devoted to the computational analysis of these social networks. The dynamics of these networks makes them challenging to study. Link prediction is one of the main problems in social networks analysis. If we represent a social network with a graph, link prediction means predicting edges that will be created between nodes in the future. The output of link prediction algorithms is using in the various areas such as recommender systems. Also, collaboration prediction between two authors using their research interests is one of the issues that improve group researches. There are few studies on link prediction that use content published by nodes for predicting collaboration between them. In this study, a new link prediction algorithm is developed based on the people interests. By extracting fields that authors have worked on them via analyzing papers published by them, this algorithm predicts their communication in future. The results of tests on SID dataset as coauthor dataset show that developed algorithm outperforms all the structure-based link prediction algorithms. Finally, the reasons of algorithm’s efficiency are analyzed and presented Manuscript Document
    • Open Access Article

      2 - Handwritten Digits Recognition Using an Ensemble Technique Based on the Firefly Algorithm
      Azar Mahmoodzadeh Hamed Agahi Marzieh  Salehi
      This paper develops a multi-step procedure for classifying Farsi handwritten digits using a combination of classifiers. Generally, the technique relies on extracting a set of characteristics from handwritten samples, training multiple classifiers to learn to discriminat Full Text
      This paper develops a multi-step procedure for classifying Farsi handwritten digits using a combination of classifiers. Generally, the technique relies on extracting a set of characteristics from handwritten samples, training multiple classifiers to learn to discriminate between digits, and finally combining the classifiers to enhance the overall system performance. First, a pre-processing course is performed to prepare the images for the main steps. Then three structural and statistical characteristics are extracted which include several features, among which a multi-objective genetic algorithm selects those more effective ones in order to reduce the computational complexity of the classification step. For the base classification, a decision tree (DT), an artificial neural networks (ANN) and a k-nearest neighbor (KNN) models are employed. Finally, the outcomes of the classifiers are fed into a classifier ensemble system to make the final decision. This hybrid system assigns different weights for each class selected by each classifier. These voting weights are adjusted by a metaheuristic firefly algorithm which optimizes the accuracy of the overall system. The performance of the implemented approach on the standard HODA dataset is compared with the base classifiers and some state-of-the-art methods. Evaluation of the proposed technique demonstrates that the proposed hybrid system attains high performance indices including accuracy of 98.88% with only eleven features. Manuscript Document
    • Open Access Article

      3 - DBCACF: A Multidimensional Method for Tourist Recommendation Based on Users’ Demographic, Context and Feedback
      Maral Kolahkaj Ali Harounabadi Alireza Nikravan shalmani Rahim Chinipardaz
      By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropr Full Text
      By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropriate areas based on user’s preferences is very difficult due to some issues such as huge amount of tourist areas, the limitation of the visiting time, and etc. In addition, the available methods have yet failed to provide accurate tourist’s recommendations based on geo-tagged media because of some problems such as data sparsity, cold start problem, considering two users with different habits as the same (symmetric similarity), and ignoring user’s personal and context information. Therefore, in this paper, a method called “Demographic-Based Context-Aware Collaborative Filtering” (DBCACF) is proposed to investigate the mentioned problems and to develop the Collaborative Filtering (CF) method with providing personalized tourist’s recommendations without users’ explicit requests. DBCACF considers demographic and contextual information in combination with the users' historical visits to overcome the limitations of CF methods in dealing with multi- dimensional data. In addition, a new asymmetric similarity measure is proposed in order to overcome the limitations of symmetric similarity methods. The experimental results on Flickr dataset indicated that the use of demographic and contextual information and the addition of proposed asymmetric scheme to the similarity measure could significantly improve the obtained results compared to other methods which used only user-item ratings and symmetric measures. Manuscript Document
    • Open Access Article

      4 - Graph Based Feature Selection Using Symmetrical Uncertainty in Microarray Dataset
      Soodeh Bakhshandeh azmi azmi Mohammad Teshnehlab
      Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray Full Text
      Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray data as well as increasing the prediction performance. In this paper, a new gene selection method is proposed based on community detection technique and ranking the best genes. Symmetric Uncertainty is used for selection of the best genes by calculation of similarity between two genes and between each gene and class label which leads to representation of search space as a graph, in the first step. Afterwards, the proposed graph is divided into several clusters using community detection algorithm and finally, after ranking the genes, the genes with maximum ranks are selected as the best genes. This approach is a supervised/unsupervised filter-based gene selection method that minimizes the redundancy between genes and maximizes the relevance of genes and class label. Performance of the proposed method is compared with thirteen well-known unsupervised/supervised gene selection approaches over six microarray datasets using four classifiers including SVM, DT, NB and k-NN. Results show the advantages of the proposed approach. Manuscript Document
    • Open Access Article

      5 - Density Measure in Context Clustering for Distributional Semantics of Word Sense Induction
      Masood Ghayoomi
      Word Sense Induction (WSI) aims at inducing word senses from data without using a prior knowledge. Utilizing no labeled data motivated researchers to use clustering techniques for this task. There exist two types of clustering algorithm: parametric or non-parametric. Al Full Text
      Word Sense Induction (WSI) aims at inducing word senses from data without using a prior knowledge. Utilizing no labeled data motivated researchers to use clustering techniques for this task. There exist two types of clustering algorithm: parametric or non-parametric. Although non-parametric clustering algorithms are more suitable for inducing word senses, their shortcomings make them useless. Meanwhile, parametric clustering algorithms show competitive results, but they suffer from a major problem that is requiring to set a predefined fixed number of clusters in advance. The main contribution of this paper is to show that utilizing the silhouette score normally used as an internal evaluation metric to measure the clusters’ density in a parametric clustering algorithm, such as K-means, in the WSI task captures words’ senses better than the state-of-the-art models. To this end, word embedding approach is utilized to represent words’ contextual information as vectors. To capture the context in the vectors, we propose two modes of experiments: either using the whole sentence, or limited number of surrounding words in the local context of the target word to build the vectors. The experimental results based on V-measure evaluation metric show that the two modes of our proposed model beat the state-of-the-art models by 4.48% and 5.39% improvement. Moreover, the average number of clusters and the maximum number of clusters in the outputs of our proposed models are relatively equal to the gold data Manuscript Document
    • Open Access Article

      6 - An Experimental Study on Performance of Text Representation Models for Sentiment Analysis
      Sajjad Jahanbakhsh Gudakahriz Amir Masoud Eftekhari Moghaddam Fariborz Mahmoudi
      Sentiment analysis in social networks has been an active research field since 2000 and it is highly useful in the decision-making process of various domains and applications. In sentiment analysis, the goal is to analyze the opinion texts posted in social networks and o Full Text
      Sentiment analysis in social networks has been an active research field since 2000 and it is highly useful in the decision-making process of various domains and applications. In sentiment analysis, the goal is to analyze the opinion texts posted in social networks and other web-based resources to extract the necessary information from them. The data collected from various social networks and web sites do not possess a structured format, and this unstructured format is the main challenge for facing such data. It is necessary to represent the texts in the form of a text representation model to be able to analyze the content to overcome this challenge. Afterward, the required analysis can be done. The research on text modeling started a few decades ago, and so far, various models have been proposed for performing this modeling process. The main purpose of this paper is to evaluate the efficiency and effectiveness of a number of commons and famous text representation models for sentiment analysis. This evaluation is carried out by using these models for sentiment classification by ensemble methods. An ensemble classifier is used for sentiment classification and after preprocessing, the texts is represented by selected models. The selected models for this study are TF-IDF, LSA, Word2Vec, and Doc2Vec and the used evaluation measures are Accuracy, Precision, Recall, and F-Measure. The results of the study show that in general, the Doc2Vec model provides better performance compared to other models in sentiment analysis and at best, accuracy is 0.72. Manuscript Document
    • Open Access Article

      7 - Evaluation of Pattern Recognition Techniques in Response to Cardiac Resynchronization Therapy (CRT)
      Mohammad Nejadeh Peyman Bayat Jalal Kheirkhah Hassan Moladoust
      Cardiac resynchronization therapy (CRT) improves cardiac function in patients with heart failure (HF), and the result of this treatment is decrease in death rate and improving quality of life for patients. This research is aimed at predicting CRT response for the progno Full Text
      Cardiac resynchronization therapy (CRT) improves cardiac function in patients with heart failure (HF), and the result of this treatment is decrease in death rate and improving quality of life for patients. This research is aimed at predicting CRT response for the prognosis of patients with heart failure under CRT. According to international instructions, in the case of approval of QRS prolongation and decrease in ejection fraction (EF), the patient is recognized as a candidate of implanting recognition device. However, regarding many intervening and effective factors, decision making can be done based on more variables. Computer-based decision-making systems especially machine learning (ML) are considered as a promising method regarding their significant background in medical prediction. Collective intelligence approaches such as particles swarm optimization (PSO) algorithm are used for determining the priorities of medical decision-making variables. This investigation was done on 209 patients and the data was collected over 12 months. In HESHMAT CRT center, 17.7% of patients did not respond to treatment. Recognizing the dominant parameters through combining machine recognition and physician’s viewpoint, and introducing back-propagation of error neural network algorithm in order to decrease classification error are the most important achievements of this research. In this research, an analytical set of individual, clinical, and laboratory variables, echocardiography, and electrocardiography (ECG) are proposed with patients’ response to CRT. Prediction of the response after CRT becomes possible by the support of a set of tools, algorithms, and variables. Manuscript Document