List of articles (by subject) Data Mining


    • Open Access Article

      1 - Prediction of Deadlocks in Concurrent Programs Using Neural Network
      Elmira Hasanzad babamir babamir
      The dependability of concurrent programs is usually limited by concurrency errors like deadlocks and data races in allocation of resources. Deadlocks are difficult to find during the program testing because they happen under very specific thread or process scheduling an More
      The dependability of concurrent programs is usually limited by concurrency errors like deadlocks and data races in allocation of resources. Deadlocks are difficult to find during the program testing because they happen under very specific thread or process scheduling and environmental conditions. In this study, we extended our previous approach for online potential deadlock detection in resources allocated by multithread programs. Our approach is based on reasoning about deadlock possibility using the prediction of future behavior of threads. Due to the nondeterministic nature, future behavior of multithread programs, in most of cases, cannot be easily specified. Before the prediction, the behavior of threads should be translated into a predictable format. Time series is our choice to this conversion because many Statistical and Artificial Intelligence techniques can be developed to predict the future members of the time series. Among all the prediction techniques, artificial neural networks showed applicable performance and flexibility in predicting complex behavioral patterns which are the most usual cases in real world applications. Our model focuses on the multithread programs which use locks to allocate resources. The proposed model was used to deadlock prediction in resources allocated by multithread Java programs and the results were evaluated. Manuscript profile
    • Open Access Article

      2 - A Robust Data Envelopment Analysis Method for Business and IT Alignment of Enterprise Architecture Scenarios
      Mehdi Fasanghari Mohsen  Sadegh Amalnick Reza Taghipour Anvari Jafar Razmi
      Information Technology is recognized as a competitive enabler in today’s dynamic business environment. Therefore, alliance of business and Information Technology process is critical, which is mostly emphasized in Information Technology governance frameworks. On the othe More
      Information Technology is recognized as a competitive enabler in today’s dynamic business environment. Therefore, alliance of business and Information Technology process is critical, which is mostly emphasized in Information Technology governance frameworks. On the other hand, Enterprise Architectures are deployed to steer organizations for achieving their objectives while being responsive to changes. Thus, it is proposed to align the business and Information Technology through investigating the suitability of Enterprise Architecture scenarios. In view of this fact, investigating a flexible decision making method for business and information technology alignment analysis is necessary, but it is not sufficient since the subjective analysis is always perturbed by some degree of uncertainty. Therefore, we have developed a new robust Data Envelopment Analysis technique designed for Enterprise Architecture scenario analysis. Several numerical experiments and a sensitivity analysis are designed to show the performance, significance, and flexibility of the proposed method in a real case. Manuscript profile
    • Open Access Article

      3 - Extracting Credit Rules from Imbalanced Data: The Case of an Iranian Export Development Bank
      Seyed Mahdi  Sadatrasoul mohammadreza gholamian Kamran shahanaghi
      Credit scoring is an important topic, and banks collect different data from their loan applicant to make an appropriate and correct decision. Rule bases are of more attention in credit decision making because of their ability to explicitly distinguish between good and b More
      Credit scoring is an important topic, and banks collect different data from their loan applicant to make an appropriate and correct decision. Rule bases are of more attention in credit decision making because of their ability to explicitly distinguish between good and bad applicants. The credit scoring datasets are usually imbalanced. This is mainly because the number of good applicants in a portfolio of loan is usually much higher than the number of loans that default. This paper use previous applied rule bases in credit scoring, including RIPPER, OneR, Decision table, PART and C4.5 to study the reliability and results of sampling on its own dataset. A real database of one of an Iranian export development bank is used and, imbalanced data issues are investigated by randomly Oversampling the minority class of defaulters, and three times under sampling of majority of non-defaulters class. The performance criterion chosen to measure the reliability of rule extractors is the area under the receiver operating characteristic curve (AUC), accuracy and number of rules. Friedman’s statistic is used to test for significance differences between techniques and datasets. The results from study show that PART is better and good and bad samples of data affect its results less. Manuscript profile
    • Open Access Article

      4 - Privacy Preserving Big Data Mining: Association Rule Hiding
      Golnar Assadat  Afzali shahriyar mohammadi
      Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to More
      Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to cover relationships between seemingly unrelated data in a data base.. Association rule hiding is a research area in privacy preserving data mining (PPDM) which addresses a solution for hiding sensitive rules within the data problem. Many researches have be done in this area, but most of them focus on reducing undesired side effect of deleting sensitive association rules in static databases. However, in the age of big data, we confront with dynamic data bases with new data entrance at any time. So, most of existing techniques would not be practical and must be updated in order to be appropriate for these huge volume data bases. In this paper, data anonymization technique is used for association rule hiding, while parallelization and scalability features are also embedded in the proposed model, in order to speed up big data mining process. In this way, instead of removing some instances of an existing important association rule, generalization is used to anonymize items in appropriate level. So, if necessary, we can update important association rules based on the new data entrances. We have conducted some experiments using three datasets in order to evaluate performance of the proposed model in comparison with Max-Min2 and HSCRIL. Experimental results show that the information loss of the proposed model is less than existing researches in this area and this model can be executed in a parallel manner for less execution time Manuscript profile
    • Open Access Article

      5 - COGNISON: A Novel Dynamic Community Detection Algorithm in Social Network
      Hamideh Sadat Cheraghchi Ali Zakerolhossieni
      The problem of community detection has a long tradition in data mining area and has many challenging facet, especially when it comes to community detection in time-varying context. While recent studies argue the usability of social science disciplines for modern social More
      The problem of community detection has a long tradition in data mining area and has many challenging facet, especially when it comes to community detection in time-varying context. While recent studies argue the usability of social science disciplines for modern social network analysis, we present a novel dynamic community detection algorithm called COGNISON inspired mainly by social theories. To be specific, we take inspiration from prototype theory and cognitive consistency theory to recognize the best community for each member by formulating community detection algorithm by human analogy disciplines. COGNISON is placed in representative based algorithm category and hints to further fortify the pure mathematical approach to community detection with stabilized social science disciplines. The proposed model is able to determine the proper number of communities by high accuracy in both weighted and binary networks. Comparison with the state of art algorithms proposed for dynamic community discovery in real datasets shows higher performance of this method in different measures of Accuracy, NMI, and Entropy for detecting communities over times. Finally our approach motivates the application of human inspired models in dynamic community detection context and suggest the fruitfulness of the connection of community detection field and social science theories to each other. Manuscript profile
    • Open Access Article

      6 - Node Classification in Social Network by Distributed Learning Automata
      Ahmad Rahnama Zadeh meybodi meybodi Masoud Taheri Kadkhoda
      The aim of this article is improving the accuracy of node classification in social network using Distributed Learning Automata (DLA). In the proposed algorithm using a local similarity measure, new relations between nodes are created, then the supposed graph is partitio More
      The aim of this article is improving the accuracy of node classification in social network using Distributed Learning Automata (DLA). In the proposed algorithm using a local similarity measure, new relations between nodes are created, then the supposed graph is partitioned according to the labeled nodes and a network of Distributed Learning Automata is corresponded on each partition. In each partition the maximal spanning tree is determined using DLA. Finally nodes are labeled according to the rewards of DLA. We have tested this algorithm on three real social network datasets, and results show that the expected accuracy of presented algorithm is achieved. Manuscript profile
    • Open Access Article

      7 - Preserving Data Clustering with Expectation Maximization Algorithm
      Leila Jafar Tafreshi Farzin Yaghmaee
      Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and informatio More
      Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and information security. Therefore, a new class of data mining methods called privacy preserving data mining (PPDM) has been developed. The aim of researches in this field is to develop techniques those could be applied to databases without violating the privacy of individuals. In this work we introduce a new approach to preserve sensitive information in databases with both numerical and categorical attributes using fuzzy logic. We map a database into a new one that conceals private information while preserving mining benefits. In our proposed method, we use fuzzy membership functions (MFs) such as Gaussian, P-shaped, Sigmoid, S-shaped and Z-shaped for private data. Then we cluster modified datasets by Expectation Maximization (EM) algorithm. Our experimental results show that using fuzzy logic for preserving data privacy guarantees valid data clustering results while protecting sensitive information. The accuracy of the clustering algorithm using fuzzy data is approximately equivalent to original data and is better than the state of the art methods in this field. Manuscript profile
    • Open Access Article

      8 - Investigating the Effect of Functional and Flexible Information Systems on Supply Chain Operation: Iran Automotive Industry
      Abbas Zareian Iraj Mahdavi Hamed Fazlollahtabar
      This research studies the relationship between supply chain and information system strategies, their effects on supply chain operation and functionality of an enterprise. Our research encompasses other ones because it uses a harmonic structure between information syst More
      This research studies the relationship between supply chain and information system strategies, their effects on supply chain operation and functionality of an enterprise. Our research encompasses other ones because it uses a harmonic structure between information systems and supply chain strategies in order to improve supply chain functionality. The previous research focused on effects of information systems on modification of the relationship between supply chain strategies and supply chain function. We decide to evaluate direct effects of information systems on supply chain strategies. In this research, we show that information systems strategy to improve the relationship between supply chain and supply chain strategies will be. Therefore, it can be said that creating Alignment between informational system strategy and supply chain strategies finally result in improvement of supply chain functionality and company’s operation. Manuscript profile
    • Open Access Article

      9 - Improved Generic Object Retrieval In Large Scale Databases By SURF Descriptor
      Hassan Farsi Reza Nasiripour Sajad Mohammadzadeh
      Normally, the-state-of-the-art methods in field of object retrieval for large databases are achieved by training process. We propose a novel large-scale generic object retrieval which only uses a single query image and training-free. Current object retrieval methods req More
      Normally, the-state-of-the-art methods in field of object retrieval for large databases are achieved by training process. We propose a novel large-scale generic object retrieval which only uses a single query image and training-free. Current object retrieval methods require a part of image database for training to construct the classifier. This training can be supervised or unsupervised and semi-supervised. In the proposed method, the query image can be a typical real image of the object. The object is constructed based on Speeded Up Robust Features (SURF) points acquired from the image. Information of relative positions, scale and orientation between SURF points are calculated and constructed into the object model. Dynamic programming is used to try all possible combinations of SURF points for query and datasets images. The ability to match partial affine transformed object images comes from the robustness of SURF points and the flexibility of the model. Occlusion is handled by specifying the probability of a missing SURF point in the model. Experimental results show that this matching technique is robust under partial occlusion and rotation. The properties and performance of the proposed method are demonstrated on the large databases. The obtained results illustrate that the proposed method improves the efficiency, speeds up recovery and reduces the storage space. Manuscript profile
    • Open Access Article

      10 - A RFMV Model and Customer Segmentation Based on Variety of Products
      Saman  Qadaki Moghaddam Neda Abdolvand Saeedeh Rajaee Harandi
      Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major change More
      Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major changes in the ability of organizations to collect, store and analyze macro-data. Therefore, over thousands of data can be stored for each customer. Hence, customer satisfaction is one of the most important organizational goals. Since all customers do not represent the same profitability to an organization, understanding and identifying the valuable customers has become the most important organizational challenge. Thus, understanding customers’ behavioral variables and categorizing customers based on these characteristics could provide better insight that will help business owners and industries to adopt appropriate marketing strategies such as up-selling and cross-selling. The use of these strategies is based on a fundamental variable, variety of products. Diversity in individual consumption may lead to increased demand for variety of products; therefore, variety of products can be used, along with other behavioral variables, to better understand and categorize customers’ behavior. Given the importance of the variety of products as one of the main parameters of assessing customer behavior, studying this factor in the field of business-to-business (B2B) communication represents a vital new approach. Hence, this study aims to cluster customers based on a developed RFM model, namely RFMV, by adding a variable of variety of products (V). Therefore, CRISP-DM and K-means algorithm was used for clustering. The results of the study indicated that the variable V, variety of products, is effective in calculating customers’ value. Moreover, the results indicated the better customers clustering and valuation by using the RFMV model. As a whole, the results of modeling indicate that the variety of products along with other behavioral variables provide more accurate clustering than RFM model. Manuscript profile
    • Open Access Article

      11 - ANFIS Modeling to Forecast Maintenance Cost of Associative Information Technology Services
      Reza Ehtesham Rasi Leila  Moradi
      Adaptive Neuro Fuzzy Inference System (ANFIS) was developed for quantifying Information Technology (IT) Generated Services perceptible by business users. In addition to forecasting, IT cost related to system maintenance can help managers for future and constructive deci More
      Adaptive Neuro Fuzzy Inference System (ANFIS) was developed for quantifying Information Technology (IT) Generated Services perceptible by business users. In addition to forecasting, IT cost related to system maintenance can help managers for future and constructive decision. This model has been applied by previous large volume of data from IT cost factors, generated services, and associative cost for building pattern, tuning and training this model well. First of all, the model was fully developed, stabilized, and passed through intensive training with large volume of data collected in an organization. It can be possible to feed a specific time period of data into the model to determine the quantity of services and their related maintenance cost. ANFIS forecasting maintenance cost of measured service availability totally provided with first quantifying services in a specific time period. Having an operational mechanism for measuring and quantifying information technology services tangible by users for estimating their costs is contributed to practical accurate investment. Some components have been considered and measured in the field of system maintenance. The main objective of this study was identifying and determining the amount of investment for maintenance of entire generated services by consideration of their relations to tangible cost factors and also intangible cost connected to service lost. Manuscript profile
    • Open Access Article

      12 - Representing a Content-based link Prediction Algorithm in Scientific Social Networks
      Hosna Solaimannezhad omid fatemi
      Predicting collaboration between two authors, using their research interests, is one of the important issues that could improve the group researches. One type of social networks is the co-authorship network that is one of the most widely used data sets for studying. A More
      Predicting collaboration between two authors, using their research interests, is one of the important issues that could improve the group researches. One type of social networks is the co-authorship network that is one of the most widely used data sets for studying. As a part of recent improvements of research, far much attention is devoted to the computational analysis of these social networks. The dynamics of these networks makes them challenging to study. Link prediction is one of the main problems in social networks analysis. If we represent a social network with a graph, link prediction means predicting edges that will be created between nodes in the future. The output of link prediction algorithms is using in the various areas such as recommender systems. Also, collaboration prediction between two authors using their research interests is one of the issues that improve group researches. There are few studies on link prediction that use content published by nodes for predicting collaboration between them. In this study, a new link prediction algorithm is developed based on the people interests. By extracting fields that authors have worked on them via analyzing papers published by them, this algorithm predicts their communication in future. The results of tests on SID dataset as coauthor dataset show that developed algorithm outperforms all the structure-based link prediction algorithms. Finally, the reasons of algorithm’s efficiency are analyzed and presented Manuscript profile
    • Open Access Article

      13 - Analysis of Business Customers’ Value Network Using Data Mining Techniques
      Forough Farazzmanesh (Isvand) Monireh Hosseini
      In today's competitive environment, customers are the most important asset to any company. Therefore companies should understand what the retention and value drivers are for each customer. An approach that can help consider customers‘ different value dimensions is the More
      In today's competitive environment, customers are the most important asset to any company. Therefore companies should understand what the retention and value drivers are for each customer. An approach that can help consider customers‘ different value dimensions is the value network. This paper aims to introduce a new approach using data mining techniques for mapping and analyzing customers‘ value network. Besides, this approach is applied in a real case study. This research contributes to develop and implement a methodology to identify and define network entities of a value network in the context of B2B relationships. To conduct this work, we use a combination of methods and techniques designed to analyze customer data-sets (e.g. RFM and customer migration) and to analyze value network. As a result, this paper develops a new strategic network view of customers and discusses how a company can add value to its customers. The proposed approach provides an opportunity for marketing managers to gain a deep understanding of their business customers, the characteristics and structure of their customers‘ value network. This paper is the first contribution of its kind to focus exclusively on large data-set analytics to analyze value network. This new approach indicates that future research of value network can further gain the data mining tools. In this case study, we identify the value entities of the network and its value flows in the telecommunication organization using the available data in order to show that it can improve the value in the network by continuous monitoring. Manuscript profile
    • Open Access Article

      14 - Using Discrete Hidden Markov Model for Modelling and Forecasting the Tourism Demand in Isfahan
      Khatereh Ghasvarian Jahromi Vida Ghasvarian Jahromi
      Tourism has been increasingly gaining acceptance as a driving force to enhance the economic growth because it brings the per capita income, employment and foreign currency earnings. Since tourism affects other industries, in many countries, tourism is considered in the More
      Tourism has been increasingly gaining acceptance as a driving force to enhance the economic growth because it brings the per capita income, employment and foreign currency earnings. Since tourism affects other industries, in many countries, tourism is considered in the economic outlook. The perishable nature of most sections dependent on the tourism has turned the prediction of tourism demand an important issue for future success. The present study, for the first time, uses the Discrete Hidden Markov Model (DHMM) to predict the tourism demand. DHMM is the discrete form of the well-known HMM approach with the capability of parametric modeling the random processes. MATLAB Software is applied to simulate and implement the proposed method. The statistic reports of Iranian and foreign tourists visiting Isfahan gained by Iran Cultural Heritage, Handicrafts, and Tourism Organization (ICHHTO)-Isfahan Tourism used for simulation of the model. To evaluate the proposed method, the prediction results are compared to the results from Artificial Neural Network, Grey model and Persistence method on the same data. Three errors indexes, MAPE (%), RMSE, and MAE, are also applied to have a better comparison between them. The results reveal that compared to three other methods, DHMM performs better in predicting tourism demand for the next year, both for Iranian and foreign tourists. Manuscript profile
    • Open Access Article

      15 - The Influence of ERP Usage on Organizational Learning: An Empirical Investigation
      Faisal Aburub
      A number of different hotels have been seen to direct significant investment towards Enterprise Recourse Planning (ERP) systems with the aim of securing sound levels of organizational learning. As a strategic instrument, organizational learning has been recommended in t More
      A number of different hotels have been seen to direct significant investment towards Enterprise Recourse Planning (ERP) systems with the aim of securing sound levels of organizational learning. As a strategic instrument, organizational learning has been recommended in the modern management arena as potentially able to achieve a competitive edge and as stabilizing the success of businesses. Learning, as an aim, is not only able to improve the skillset and knowledge of employees, but also achieving organizational growth and development, whilst also helping to build a dynamic learning organization. Organizational learning is especially important in modern-day firms, when staff might choose to leave or change their role owing to the view that knowledge-sharing could be detrimental to their own success. The present work seeks to examine the impact of ERP usage on organizational learning. A new research model has been presented, this model has been empirically investigated in the Jordanian hotel industry. 350 questionnaires were distributed across a total of 350 hotels. 317 questionnaires were returned. Structural equation modeling (AMOS 18) was used to analyze the data. The findings from the empirical findings emphasize that ERP usage has significant impact on organizational learning. In line with the study findings, various aspects of organizational learning, such as continuous learning, system perspective, openness and experimentation and transfer and integration are recognized as able to best encourage the use of ERP. Suggestions for future work and discussion on research limitations are also discussed. Manuscript profile
    • Open Access Article

      16 - Handwritten Digits Recognition Using an Ensemble Technique Based on the Firefly Algorithm
      Azar Mahmoodzadeh Hamed Agahi Marzieh  Salehi
      This paper develops a multi-step procedure for classifying Farsi handwritten digits using a combination of classifiers. Generally, the technique relies on extracting a set of characteristics from handwritten samples, training multiple classifiers to learn to discriminat More
      This paper develops a multi-step procedure for classifying Farsi handwritten digits using a combination of classifiers. Generally, the technique relies on extracting a set of characteristics from handwritten samples, training multiple classifiers to learn to discriminate between digits, and finally combining the classifiers to enhance the overall system performance. First, a pre-processing course is performed to prepare the images for the main steps. Then three structural and statistical characteristics are extracted which include several features, among which a multi-objective genetic algorithm selects those more effective ones in order to reduce the computational complexity of the classification step. For the base classification, a decision tree (DT), an artificial neural networks (ANN) and a k-nearest neighbor (KNN) models are employed. Finally, the outcomes of the classifiers are fed into a classifier ensemble system to make the final decision. This hybrid system assigns different weights for each class selected by each classifier. These voting weights are adjusted by a metaheuristic firefly algorithm which optimizes the accuracy of the overall system. The performance of the implemented approach on the standard HODA dataset is compared with the base classifiers and some state-of-the-art methods. Evaluation of the proposed technique demonstrates that the proposed hybrid system attains high performance indices including accuracy of 98.88% with only eleven features. Manuscript profile
    • Open Access Article

      17 - DBCACF: A Multidimensional Method for Tourist Recommendation Based on Users’ Demographic, Context and Feedback
      Maral Kolahkaj Ali Harounabadi Alireza Nikravan shalmani Rahim Chinipardaz
      By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropr More
      By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropriate areas based on user’s preferences is very difficult due to some issues such as huge amount of tourist areas, the limitation of the visiting time, and etc. In addition, the available methods have yet failed to provide accurate tourist’s recommendations based on geo-tagged media because of some problems such as data sparsity, cold start problem, considering two users with different habits as the same (symmetric similarity), and ignoring user’s personal and context information. Therefore, in this paper, a method called “Demographic-Based Context-Aware Collaborative Filtering” (DBCACF) is proposed to investigate the mentioned problems and to develop the Collaborative Filtering (CF) method with providing personalized tourist’s recommendations without users’ explicit requests. DBCACF considers demographic and contextual information in combination with the users' historical visits to overcome the limitations of CF methods in dealing with multi- dimensional data. In addition, a new asymmetric similarity measure is proposed in order to overcome the limitations of symmetric similarity methods. The experimental results on Flickr dataset indicated that the use of demographic and contextual information and the addition of proposed asymmetric scheme to the similarity measure could significantly improve the obtained results compared to other methods which used only user-item ratings and symmetric measures. Manuscript profile
    • Open Access Article

      18 - Graph Based Feature Selection Using Symmetrical Uncertainty in Microarray Dataset
      Soodeh Bakhshandeh azmi azmi Mohammad Teshnehlab
      Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray More
      Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray data as well as increasing the prediction performance. In this paper, a new gene selection method is proposed based on community detection technique and ranking the best genes. Symmetric Uncertainty is used for selection of the best genes by calculation of similarity between two genes and between each gene and class label which leads to representation of search space as a graph, in the first step. Afterwards, the proposed graph is divided into several clusters using community detection algorithm and finally, after ranking the genes, the genes with maximum ranks are selected as the best genes. This approach is a supervised/unsupervised filter-based gene selection method that minimizes the redundancy between genes and maximizes the relevance of genes and class label. Performance of the proposed method is compared with thirteen well-known unsupervised/supervised gene selection approaches over six microarray datasets using four classifiers including SVM, DT, NB and k-NN. Results show the advantages of the proposed approach. Manuscript profile
    • Open Access Article

      19 - Density Measure in Context Clustering for Distributional Semantics of Word Sense Induction
      Masood Ghayoomi
      Word Sense Induction (WSI) aims at inducing word senses from data without using a prior knowledge. Utilizing no labeled data motivated researchers to use clustering techniques for this task. There exist two types of clustering algorithm: parametric or non-parametric. Al More
      Word Sense Induction (WSI) aims at inducing word senses from data without using a prior knowledge. Utilizing no labeled data motivated researchers to use clustering techniques for this task. There exist two types of clustering algorithm: parametric or non-parametric. Although non-parametric clustering algorithms are more suitable for inducing word senses, their shortcomings make them useless. Meanwhile, parametric clustering algorithms show competitive results, but they suffer from a major problem that is requiring to set a predefined fixed number of clusters in advance. Word Sense Induction (WSI) aims at inducing word senses from data without using a prior knowledge. Utilizing no labeled data motivated researchers to use clustering techniques for this task. There exist two types of clustering algorithm: parametric or non-parametric. Although non-parametric clustering algorithms are more suitable for inducing word senses, their shortcomings make them useless. Meanwhile, parametric clustering algorithms show competitive results, but they suffer from a major problem that is requiring to set a predefined fixed number of clusters in advance. The main contribution of this paper is to show that utilizing the silhouette score normally used as an internal evaluation metric to measure the clusters’ density in a parametric clustering algorithm, such as K-means, in the WSI task captures words’ senses better than the state-of-the-art models. To this end, word embedding approach is utilized to represent words’ contextual information as vectors. To capture the context in the vectors, we propose two modes of experiments: either using the whole sentence, or limited number of surrounding words in the local context of the target word to build the vectors. The experimental results based on V-measure evaluation metric show that the two modes of our proposed model beat the state-of-the-art models by 4.48% and 5.39% improvement. Moreover, the average number of clusters and the maximum number of clusters in the outputs of our proposed models are relatively equal to the gold data Manuscript profile
    • Open Access Article

      20 - An Experimental Study on Performance of Text Representation Models for Sentiment Analysis
      Sajjad Jahanbakhsh Gudakahriz Amir Masoud Eftekhari Moghaddam Fariborz Mahmoudi
      Sentiment analysis in social networks has been an active research field since 2000 and it is highly useful in the decision-making process of various domains and applications. In sentiment analysis, the goal is to analyze the opinion texts posted in social networks and o More
      Sentiment analysis in social networks has been an active research field since 2000 and it is highly useful in the decision-making process of various domains and applications. In sentiment analysis, the goal is to analyze the opinion texts posted in social networks and other web-based resources to extract the necessary information from them. The data collected from various social networks and web sites do not possess a structured format, and this unstructured format is the main challenge for facing such data. It is necessary to represent the texts in the form of a text representation model to be able to analyze the content to overcome this challenge. Afterward, the required analysis can be done. The research on text modeling started a few decades ago, and so far, various models have been proposed for performing this modeling process. The main purpose of this paper is to evaluate the efficiency and effectiveness of a number of commons and famous text representation models for sentiment analysis. This evaluation is carried out by using these models for sentiment classification by ensemble methods. An ensemble classifier is used for sentiment classification and after preprocessing, the texts is represented by selected models. The selected models for this study are TF-IDF, LSA, Word2Vec, and Doc2Vec and the used evaluation measures are Accuracy, Precision, Recall, and F-Measure. The results of the study show that in general, the Doc2Vec model provides better performance compared to other models in sentiment analysis and at best, accuracy is 0.72. Manuscript profile
    • Open Access Article

      21 - Evaluation of Pattern Recognition Techniques in Response to Cardiac Resynchronization Therapy (CRT)
      Mohammad Nejadeh Peyman Bayat Jalal Kheirkhah Hassan Moladoust
      Cardiac resynchronization therapy (CRT) improves cardiac function in patients with heart failure (HF), and the result of this treatment is decrease in death rate and improving quality of life for patients. This research is aimed at predicting CRT response for the progno More
      Cardiac resynchronization therapy (CRT) improves cardiac function in patients with heart failure (HF), and the result of this treatment is decrease in death rate and improving quality of life for patients. This research is aimed at predicting CRT response for the prognosis of patients with heart failure under CRT. According to international instructions, in the case of approval of QRS prolongation and decrease in ejection fraction (EF), the patient is recognized as a candidate of implanting recognition device. However, regarding many intervening and effective factors, decision making can be done based on more variables. Computer-based decision-making systems especially machine learning (ML) are considered as a promising method regarding their significant background in medical prediction. Collective intelligence approaches such as particles swarm optimization (PSO) algorithm are used for determining the priorities of medical decision-making variables. This investigation was done on 209 patients and the data was collected over 12 months. In HESHMAT CRT center, 17.7% of patients did not respond to treatment. Recognizing the dominant parameters through combining machine recognition and physician’s viewpoint, and introducing back-propagation of error neural network algorithm in order to decrease classification error are the most important achievements of this research. In this research, an analytical set of individual, clinical, and laboratory variables, echocardiography, and electrocardiography (ECG) are proposed with patients’ response to CRT. Prediction of the response after CRT becomes possible by the support of a set of tools, algorithms, and variables. Manuscript profile
    • Open Access Article

      22 - The Development of a Hybrid Error Feedback Model for Sales Forecasting
      Mehdi Farrokhbakht Foumani Sajad Moazami Goudarzi
      Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analy More
      Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analyzing time series and data mining due to the number of intervening parameters. Various models were presented on this issue and each one found acceptable results. However, developing the methods in this study is still considered by researchers. In this regard, the present study provided a hybrid model with error feedback for sales forecasting. In this study, forecasting was conducted using a supervised learning method. Then, the remaining values (model error) were specified and the error values were forecasted using another learning method. Finally, two trained models were combined together and consecutively used for sales forecasting. In other words, first the forecasting was conducted and then the error rate was determined by the second model. The total forecasting and model error indicated the final forecasting. The computational results obtained from numerical experiments indicated the superiority of the proposed hybrid method performance over the common models in the available literature and reduced the indicators related to forecasting error. Manuscript profile
    • Open Access Article

      23 - Developing A Contextual Combinational Approach for Predictive Analysis of Users Mobile Phone Trajectory Data in LBSNs
      Fatemeh  Ghanaati Gholamhossein Ekbatanifard Kamrad Khoshhal Roudposhti
      Today, smartphones, due to their ubiquity, have become indispensable in human daily life. Progress in the technology of mobile phones has recently resulted in the emergence of several popular services such as location-based social networks (LBSNs) and predicting the nex More
      Today, smartphones, due to their ubiquity, have become indispensable in human daily life. Progress in the technology of mobile phones has recently resulted in the emergence of several popular services such as location-based social networks (LBSNs) and predicting the next Point of Interest (POI), which is an important task in these services. The gathered trajectory data in LBSNs include various contextual information such as geographical and temporal contextual information (GTCI) that play a crucial role in the next POI recommendations. Various methods, including collaborating filtering (CF) and recurrent neural networks, incorporated the contextual information of the user’ trajectory data to predict the next POIs. CF methods do not consider the effect of sequential data on modeling, while the next POI prediction problem is inherently a time sequence problem. Although recurrent models have been proposed for sequential data modeling, they have limitations such as similarly considering the effect of contextual information. Nonetheless, they have a separate impact as well. In the current study, a geographical temporal contextual information-extended attention gated recurrent unit (GTCI-EAGRU) architecture was proposed to separately consider the influence of geographical and temporal contextual information on the next POI recommendations. In this research, the GRU model was developed using three separate attention gates to consider the contextual information of the user trajectory data in the recurrent layer GTCI-EAGRU architecture, including timestamp, geographical, and temporal contextual attention gates. Inspired by the assumption of the matrix factorization method in CF approaches, a ranked list of POI recommendations was provided for each user. Moreover, a comprehensive evaluation was conducted by utilizing large-scale real-world datasets based on three LBSNs, including Gowalla, Brightkite, and Foursquare. The results revealed that the performance of GTCI-EAGRU was higher than that of competitive baseline methods in terms of Acc@10, on average, by 42.11% in three datasets. Manuscript profile
    • Open Access Article

      24 - Representing a Novel Expanded Version of Shor’s Algorithm and a Real-Time Experiment using IBM Q-Experience Platform
      Sepehr  Goodarzi Afshin Rezakhani Mahdi Maleki
      The data are stored on the memory of the classical computer in small units of classical bits, which could be either 0 or 1. However, on a Quantum Computer, The Quantum States of each Quantum Bit (Qbit), would be every possible number between 0 and 1, including themselve More
      The data are stored on the memory of the classical computer in small units of classical bits, which could be either 0 or 1. However, on a Quantum Computer, The Quantum States of each Quantum Bit (Qbit), would be every possible number between 0 and 1, including themselves. By placing the photons on a special state, which is a spot located at the middle of the two-dimensional space vectors (█(1@0)) and (█(1@1)) on the Unit Circle, which is called Superposition and we can take advantage of properties of this state when we place lots of vectors of N-dimensional spaces in superposition and we can do a parallelization and factorization for getting significant speedup. In fact, in Quantum Computing we are taking advantage of Quantum Dynamic Principles to process the data, which Classical Computers lack on, by considering the limitations of logical concepts behind them. Through this paper, we expand a quantum algorithm for the number of n Qbits in a new way and by implementing circuits using IBM-Q Experience, we are going to have some practical results, which are more obvious to be demonstrable. By expanding the Quantum Algorithms and using Linear Algebra, we can manage to achieve the goals at a higher level, the ones that Classical Computers are unable to perform, as machine learning problems with complicated models and by expanding the subject we can mention majors in different sciences like Chemistry (predicting the Structure of proteins with higher percentage accuracy in less period), Astronomy and so on. Manuscript profile
    • Open Access Article

      25 - Computational Model for Image Processing in the Minds of People with Visual Agnosia using Fuzzy Cognitive Map
      Elham Askari Sara Motamed
      The Agnosia is a neurological condition that leads to an inability to name, recognize, and extract meaning from the visual, auditory, and sensory environment, despite the fact that the receptor organ is perfect. Visual agnosia is the most common type of this disorder. P More
      The Agnosia is a neurological condition that leads to an inability to name, recognize, and extract meaning from the visual, auditory, and sensory environment, despite the fact that the receptor organ is perfect. Visual agnosia is the most common type of this disorder. People with agnosia have trouble communicating between the mind and the brain. As a result, they cannot understand the images seen. In this paper, a model is proposed that is based on the visual pathway so that it first receives the visual stimulus and then, after understanding, the object is identified. In this paper, a model based on the visual pathway is proposed and using intelligent Fuzzy Cognitive Map will help improve image processing in the minds of these patients. First, the proposed model that is inspired by the visual perception pathway, is designed. Then, appropriate attributes that include the texture and color of the images are extracted and the concept of the seen image is perceived using Fuzzy Cognitive Mapping, the meaning recognition and the relationships between objects. This model reduces the difficulty of perceiving and recognizing objects in patients with visual agnosia. The results show that the proposed model, with 98.1% accuracy, shows better performance than other methods. Manuscript profile
    • Open Access Article

      26 - Optimization of Query Processing in Versatile Database Using Ant Colony Algorithm
      hasan Asil
      Nowadays, with the advancement of database information technology, databases has led to large-scale distributed databases. According to this study, database management systems are improved and optimized so that they provide responses to customer questions with lower co More
      Nowadays, with the advancement of database information technology, databases has led to large-scale distributed databases. According to this study, database management systems are improved and optimized so that they provide responses to customer questions with lower cost. Query processing in database management systems is one of the important topics that grabs attentions. Until now, many techniques have been implemented for query processing in database system. The purpose of these methods is to optimize query processing in the database. The main topics that is interested in query processing in the database makes run-time adjustments of processing or summarizing topics by using the new approaches. The aim of this research is to optimize processing in the database by using adaptive methods. Ant Colony Algorithm (ACO) is used for solving optimization problems. ACO relies on the created pheromone to select the optimal solution. In this article, in order to make adaptive hybrid query processing. The proposed algorithm is fundamentally divided into three parts: separator, replacement policy, and query similarity detector. In order to improve the optimization and frequent adaption and correct selection in queries, the Ant Colony Algorithm has been applied in this research. In this algorithm, based on Versatility (adaptability) scheduling, Queries sent to the database have been attempted be collected. The simulation results of this method demonstrate that reduce spending time in the database. According to the proposed algorithm, one of the advantages of this method is to identify frequent queries in high traffic times and minimize the time and the execution time. This optimization method reduces the system load during high traffic load times for adaptive query Processing and generally reduces the execution runtime and aiming to minimize cost. The rate of reduction of query cost in the database with this method is 2.7%. Due to the versatility of high-cost queries, this improvement is manifested in high traffic times. In the future Studies, by adapting new system development methods, distributed databases can be optimized. Manuscript profile
    • Open Access Article

      27 - Proposing an FCM-MCOA Clustering Approach Stacked with Convolutional Neural Networks for Analysis of Customers in Insurance Company
      Motahareh Ghavidel meisam Yadollahzadeh tabari Mehdi Golsorkhtabaramiri
      To create a customer-based marketing strategy, it is necessary to perform a proper analysis of customer data so that customers can be separated from each other or predict their future behavior. The datasets related to customers in any business usually are high-dimension More
      To create a customer-based marketing strategy, it is necessary to perform a proper analysis of customer data so that customers can be separated from each other or predict their future behavior. The datasets related to customers in any business usually are high-dimensional with too many instances and include both supervised and unsupervised ones. For this reason, companies today are trying to satisfy their customers as much as possible. This issue requires careful consideration of customers from several aspects. Data mining algorithms are one of the practical methods in businesses to find the required knowledge from customer’s both demographic and behavioral. This paper presents a hybrid clustering algorithm using the Fuzzy C-Means (FCM) method and the Modified Cuckoo Optimization Algorithm (MCOA). Since customer data analysis has a key role in ensuring a company's profitability, The Insurance Company (TIC) dataset is utilized for the experiments and performance evaluation. We compare the convergence of the proposed FCM-MCOA approach with some conventional optimization methods, such as Genetic Algorithm (GA) and Invasive Weed Optimization (IWO). Moreover, we suggest a customer classifier using the Convolutional Neural Networks (CNNs). Simulation results reveal that the FCM-MCOA converges faster than conventional clustering methods. In addition, the results indicate that the accuracy of the CNN-based classifier is more than 98%. CNN-based classifier converges after some couples of iterations, which shows a fast convergence in comparison with the conventional classifiers, such as Decision Tree (DT), Support Vector Machine (SVM), K-Nearest Neighborhood (KNN), and Naive Bayes (NB) classifiers. Manuscript profile