Clustering

Open Access Article

1 - Referral Traffic Analysis: A Case Study of the Iranian Students' News Agency (ISNA)
Roya Hassanian Esfahani Mohammad Javad Kargar

10.7508/jist.2016.01.006

Web traffic analysis is a well-known e-marketing activity. Today most of the news agencies have entered the web providing a variety of online services to their customers. The number of online news consumers is also increasing dramatically all over the world. A news webs More

Web traffic analysis is a well-known e-marketing activity. Today most of the news agencies have entered the web providing a variety of online services to their customers. The number of online news consumers is also increasing dramatically all over the world. A news website usually benefits from different acquisition channels including organic search services, paid search services, referral links, direct hits, links from online social media, and e-mails. This article presents the results of an empirical study of analyzing referral traffic of a news website through data mining techniques. Main methods include correlation analysis, outlier detection, clustering, and model performance evaluation. The results decline any significant relationship between the amount of referral traffic coming from a referrer website and the website's popularity state. Furthermore, the referrer websites of the study fit into three clusters applying K-means Squared Euclidean Distance clustering algorithm. Performance evaluations assure the significance of the model. Also, among detected clusters, the most populated one has labeled as "Automatic News Aggregator Websites" by the experts. The findings of the study help to have a better understanding of the different referring behaviors, which form around 15% of the overall traffic of Iranian Students' News Agency (ISNA) website. They are also helpful to develop more efficient online marketing plans, business alliances, and corporate strategies. Manuscript profile

Open Access Article

2 - A Study on Clustering for Clustering Based Image De-noising
Hossein Bakhshi Golestani Mohsen Joneidi Mostafa Sadeghi

10.7508/jist.2014.04.001

In this paper, the problem of de-noising of an image contaminated with Additive White Gaussian Noise (AWGN) is studied. This subject is an open problem in signal processing for more than 50 years. In the present paper, we suggest a method based on global clustering of i More

In this paper, the problem of de-noising of an image contaminated with Additive White Gaussian Noise (AWGN) is studied. This subject is an open problem in signal processing for more than 50 years. In the present paper, we suggest a method based on global clustering of image constructing blocks. As the type of clustering plays an important role in clustering-based de-noising methods, we address two questions about the clustering. The first, which parts of the data should be considered for clustering? The second, what data clustering method is suitable for de-noising? Then clustering is exploited to learn an over complete dictionary. By obtaining sparse decomposition of the noisy image blocks in terms of the dictionary atoms, the de-noised version is achieved. Experimental results show that our dictionary learning framework outperforms its competitors in terms of de-noising performance and execution time. Manuscript profile

Open Access Article

3 - On-road Vehicle detection based on hierarchical clustering using adaptive vehicle localization
Moslem Mohammadi Jenghara Hossein Ebrahimpour Komleh

10.7508/jist.2015.04.004

Vehicle detection is one of the important tasks in automatic driving. It is a hard problem that many researchers focused on it. Most commercial vehicle detection systems are based on radar. But these methods have some problems such as have problem in zigzag motions. Im More

Vehicle detection is one of the important tasks in automatic driving. It is a hard problem that many researchers focused on it. Most commercial vehicle detection systems are based on radar. But these methods have some problems such as have problem in zigzag motions. Image processing techniques can overcome these problems.This paper introduces a method based on hierarchical clustering using low-level image features for on-road vehicle detection. Each vehicle assumed as a cluster. In traditional clustering methods, the threshold distance for each cluster is fixed, but in this paper, the adaptive threshold varies according to the position of each cluster. The threshold measure is computed with bivariate normal distribution. Sampling and teammate selection for each cluster is applied by the members-based weighted average. For this purpose, unlike other methods that use only horizontal or vertical lines, a fully edge detection algorithm was utilized. Corner is an important feature of video images that commonly were used in vehicle detection systems. In this paper, Harris features are applied to detect the corners. LISA data set is used to evaluate the proposed method. Several experiments are applied to investigate the performance of proposed algorithm. Experimental results show good performance compared to other algorithms . Manuscript profile

Open Access Article

4 - Coverage Improving with Energy Efficient in Wireless Sensor Networks
Amir Pakmehr Ali Ghaffari

10.7508/jist.2017.17.008

Wireless sensor networks (WSNs) are formed by numerous sensors nodes that are able to sense different environmental phenomena and to transfer the collected data to the sink. The coverage of a network is one of the main discussion and one of the parameters of service qua More

Wireless sensor networks (WSNs) are formed by numerous sensors nodes that are able to sense different environmental phenomena and to transfer the collected data to the sink. The coverage of a network is one of the main discussion and one of the parameters of service quality in WSNs. In most of the applications, the sensor nodes are scattered in the environment randomly that causes the density of the nodes to be high in some regions and low in some other regions. In this case, some regions are not covered with any nodes of the network that are called covering holes. Moreover, creating some regions with high density causes extra overlapping and consequently the consumption of energy increases in the network and life of the network decreases. The proposed approach causes an increase in life of the network and an increase in it through careful selection of the most appropriate approach as cluster head node and form clusters with a maximum length of two steps and selecting some nodes as redundancy nodes in order to cover the created holes in the network. The proposed scheme is simulated using MATLAB software. The function of the suggested approach will be compared with Learning Automata based Energy Efficient Coverage protocol (LAEEC) approach either. Simulation results shows that the function of the suggested approach is better than LAEEC considering the parameters such as average of the active nodes, average remaining energy in nodes, percent of network coverage and number of control packets. Manuscript profile

Open Access Article

5 - Preserving Data Clustering with Expectation Maximization Algorithm
Leila Jafar Tafreshi Farzin Yaghmaee

10.7508/jist.2016.03.004

Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and informatio More

Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and information security. Therefore, a new class of data mining methods called privacy preserving data mining (PPDM) has been developed. The aim of researches in this field is to develop techniques those could be applied to databases without violating the privacy of individuals. In this work we introduce a new approach to preserve sensitive information in databases with both numerical and categorical attributes using fuzzy logic. We map a database into a new one that conceals private information while preserving mining benefits. In our proposed method, we use fuzzy membership functions (MFs) such as Gaussian, P-shaped, Sigmoid, S-shaped and Z-shaped for private data. Then we cluster modified datasets by Expectation Maximization (EM) algorithm. Our experimental results show that using fuzzy logic for preserving data privacy guarantees valid data clustering results while protecting sensitive information. The accuracy of the clustering algorithm using fuzzy data is approximately equivalent to original data and is better than the state of the art methods in this field. Manuscript profile

Open Access Article

6 - Graph Based Feature Selection Using Symmetrical Uncertainty in Microarray Dataset
Soodeh Bakhshandeh azmi azmi Mohammad Teshnehlab

10.7508/jist.2019.01.004

Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray More

Microarray data with small samples and thousands of genes makes a difficult challenge for researches. Using gene selection in microarray data helps to select the most relevant genes from original dataset with the purpose of reducing the dimensionality of the microarray data as well as increasing the prediction performance. In this paper, a new gene selection method is proposed based on community detection technique and ranking the best genes. Symmetric Uncertainty is used for selection of the best genes by calculation of similarity between two genes and between each gene and class label which leads to representation of search space as a graph, in the first step. Afterwards, the proposed graph is divided into several clusters using community detection algorithm and finally, after ranking the genes, the genes with maximum ranks are selected as the best genes. This approach is a supervised/unsupervised filter-based gene selection method that minimizes the redundancy between genes and maximizes the relevance of genes and class label. Performance of the proposed method is compared with thirteen well-known unsupervised/supervised gene selection approaches over six microarray datasets using four classifiers including SVM, DT, NB and k-NN. Results show the advantages of the proposed approach. Manuscript profile

Open Access Article

7 - Energy Efficient Clustering Algorithm for Wireless Sensor Networks
Maryam Bavaghar Amin Mohajer Sarah Taghavi Motlagh

10.7508/jist.2019.04.001

In Wireless Sensor Networks (WSNs), sensor nodes are usually deployed with limited energy reserves in remote environments for a long period of time with less or no human intervention. It makes energy efficiency as a challenging issue both for the design and deployment o More

In Wireless Sensor Networks (WSNs), sensor nodes are usually deployed with limited energy reserves in remote environments for a long period of time with less or no human intervention. It makes energy efficiency as a challenging issue both for the design and deployment of sensor networks. This paper presents a novel approach named Energy Efficient Clustering Algorithm (EECA) for Wireless Sensor Networks which is based on two phases clustering model and provides maximum network coverage in an energy efficient way. In this framework, an effective resource-aware load balancing approach applied for autonomous methods of configuring the parameters in accordance with the signaling patterns in which approximately the same bit rate data is provided for each sensor. This resource-efficient clustering model can also form energy balanced clusters which results in increasing network life time and ensuring better network coverage. Simulation results prove that EECA is better than LEACH, LEA2C and EECS with respect to network lifetime and at the same time achieving more network coverage. In addition to obtained an optimal cluster size with minimum energy loss, the proposed approach also suggests new and better way for selecting cluster heads to reduce energy consumption of the distributed nodes resulting in increased operational reliability of sensor networks. Manuscript profile

Open Access Article

8 - Overcoming the Link Prediction Limitation in Sparse Networks using Community Detection
Mohammad Pouya Salvati Jamshid Bagherzadeh Mohasefi Sadegh Sulaimany

10.52547/jist.9.35.183

20.1001.1.23221437.2021.9.35.4.3

Link prediction seeks to detect missing links and the ones that may be established in the future given the network structure or node features. Numerous methods have been presented for improving the basic unsupervised neighbourhood-based methods of link prediction. A maj More

Link prediction seeks to detect missing links and the ones that may be established in the future given the network structure or node features. Numerous methods have been presented for improving the basic unsupervised neighbourhood-based methods of link prediction. A major issue confronted by all these methods, is that many of the available networks are sparse. This results in high volume of computation, longer processing times, more memory requirements, and more poor results. This research has presented a new, distinct method for link prediction based on community detection in large-scale sparse networks. Here, the communities over the network are first identified, and the link prediction operations are then performed within each obtained community using neighbourhood-based methods. Next, a new method for link prediction has been carried out between the clusters with a specified manner for maximal utilization of the network capacity. Utilized community detection algorithms are Best partition, Link community, Info map and Girvan-Newman, and the datasets used in experiments are Email, HEP, REL, Wikivote, Word and PPI. For evaluation of the proposed method, three measures have been used: precision, computation time and AUC. The results obtained over different datasets demonstrate that extra calculations have been prevented, and precision has been increased. In this method, runtime has also been reduced considerably. Moreover, in many cases Best partition community detection method has good results compared to other community detection algorithms. Manuscript profile

Open Access Article

9 - Reducing Energy Consumption in Sensor-Based Internet of Things Networks Based on Multi-Objective Optimization Algorithms
Mohammad sedighimanesh Hessam Zandhessami Mahmood Alborzi Mohammadsadegh Khayyatian

10.52547/jist.15639.10.39.180

20.1001.1.23221437.2022.10.39.1.5

Energy is an important parameter in establishing various communications types in the sensor-based IoT. Sensors usually possess low-energy and non-rechargeable batteries since these sensors are often applied in places and applications that cannot be recharged. The mos More

Energy is an important parameter in establishing various communications types in the sensor-based IoT. Sensors usually possess low-energy and non-rechargeable batteries since these sensors are often applied in places and applications that cannot be recharged. The most important objective of the present study is to minimize the energy consumption of sensors and increase the IoT network's lifetime by applying multi-objective optimization algorithms when selecting cluster heads and routing between cluster heads for transferring data to the base station. In the present article, after distributing the sensor nodes in the network, the type-2 fuzzy algorithm has been employed to select the cluster heads and also the genetic algorithm has been used to create a tree between the cluster heads and base station. After selecting the cluster heads, the normal nodes become cluster members and send their data to the cluster head. After collecting and aggregating the data by the cluster heads, the data is transferred to the base station from the path specified by the genetic algorithm. The proposed algorithm was implemented with MATLAB simulator and compared with LEACH, MB-CBCCP, and DCABGA protocols, the simulation results indicate the better performance of the proposed algorithm in different environments compared to the mentioned protocols. Due to the limited energy in the sensor-based IoT and the fact that they cannot be recharged in most applications, the use of multi-objective optimization algorithms in the design and implementation of routing and clustering algorithms has a significant impact on the increase in the lifetime of these networks. Manuscript profile

Open Access Article

10 - Energy Efficient Routing-Based Clustering Protocol Using Computational Intelligence Algorithms in Sensor-Based IoT
Mohammad sedighimanesh Hessam Zandhessami Mahmood Alborzi Mohammadsadegh Khayyatian

10.52547/jist.9.33.55

20.1001.1.23221437.2021.9.33.6.1

Background: The main limitation of wireless IoT sensor-based networks is their energy resource, which cannot be charged or replaced because, in most applications, these sensors are usually applied in places where they are not accessible or rechargeable. Objective: The p More

Background: The main limitation of wireless IoT sensor-based networks is their energy resource, which cannot be charged or replaced because, in most applications, these sensors are usually applied in places where they are not accessible or rechargeable. Objective: The present article's main objective is to assist in improving energy consumption in the sensor-based IoT network and thus increase the network’s lifetime. Cluster heads are used to send data to the base station. Methods: In the present paper, the type-1 fuzzy algorithm is employed to select cluster heads, and the type-2 fuzzy algorithm is used for routing between cluster heads to the base station. After selecting the cluster head using the type-1 fuzzy algorithm, the normal nodes become the members of the cluster heads and send their data to the cluster head, and then the cluster heads transfer the collected data to the main station through the path which has been determined by the type-2 fuzzy algorithm. Results: The proposed algorithm was implemented using MATLAB simulator and compared with LEACH, DEC, and DEEC protocols. The simulation results suggest that the proposed protocol among the mentioned algorithms increases the network’s lifetime in homogeneous and heterogeneous environments. Conclusion: Due to the energy limitation in sensor-based IoT networks and the impossibility of recharging the sensors in most applications, the use of computational intelligence techniques in the design and implementation of these algorithms considerably contributes to the reduction of energy consumption and ultimately the increase in network’s lifetime. Manuscript profile

Open Access Article

11 - Cluster-based Coverage Scheme for Wireless Sensor Networks using Learning Automata
Ali Ghaffari Seyyed Keyvan Mousavi

10.52547/jist.9.35.197

20.1001.1.23221437.2021.9.35.7.6

Network coverage is one of the most important challenges in wireless sensor networks (WSNs). In a WSN, each sensor node has a sensing area coverage based on its sensing range. In most applications, sensor nodes are randomly deployed in the environment which causes the d More

Network coverage is one of the most important challenges in wireless sensor networks (WSNs). In a WSN, each sensor node has a sensing area coverage based on its sensing range. In most applications, sensor nodes are randomly deployed in the environment which causes the density of nodes become high in some areas and low in some other. In this case, some areas are not covered by none of sensor nodes which these areas are called coverage holes. Also, creating areas with high density leads to redundant overlapping and as a result the network lifetime decreases. In this paper, a cluster-based scheme for the coverage problem of WSNs using learning automata is proposed. In the proposed scheme, each node creates the action and probability vectors of learning automata for itself and its neighbors, then determines the status of itself and all its neighbors and finally sends them to the cluster head (CH). Afterward, each CH starts to reward or penalize the vectors and sends the results to the sender for updating purposes. Thereafter, among the sent vectors, the CH node selects the best action vector and broadcasts it in the form of a message inside the cluster. Finally, each member changes its status in accordance with the vector included in the received message from the corresponding CH and the active sensor nodes perform environment monitoring operations. The simulation results show that the proposed scheme improves the network coverage and the energy consumption. Manuscript profile

Open Access Article

12 - Word Sense Induction in Persian and English: A Comparative Study
Masood Ghayoomi

10.52547/jist.9.36.263

20.1001.1.23221437.2021.9.36.3.4

Words in the natural language have forms and meanings, and there might not always be a one-to-one match between them. This property of the language causes words to have more than one meaning; as a result, a text processing system faces challenges to determine the precis More

Words in the natural language have forms and meanings, and there might not always be a one-to-one match between them. This property of the language causes words to have more than one meaning; as a result, a text processing system faces challenges to determine the precise meaning of the target word in a sentence. Using lexical resources or lexical databases, such as WordNet, might be a help, but due to their manual development, they become outdated by passage of time and language change. Moreover, the lexical resources might be domain dependent which are unusable for open domain natural language processing tasks. These drawbacks are a strong motivation to use unsupervised machine learning approaches to induce word senses from the natural data. To reach the goal, the clustering approach can be utilized such that each cluster resembles a sense. In this paper, we study the performance of a word sense induction model by using three variables: a) the target language: in our experiments, we run the induction process on Persian and English; b) the type of the clustering algorithm: both parametric clustering algorithms, including hierarchical and partitioning, and non-parametric clustering algorithms, including probabilistic and density-based, are utilized to induce senses; c) the context of the target words to capture the information in vectors created for clustering: for the input of the clustering algorithms, the vectors are created either based on the whole sentence in which the target word is located; or based on the limited surrounding words of the target word. We evaluate the clustering performance externally. Moreover, we introduce a normalized, joint evaluation metric to compare the models. The experimental results for both Persian and English test data showed that the window-based partitioningK-means algorithm obtained the best performance. Manuscript profile

Open Access Article

13 - Foreground-Back ground Segmentation using K-Means Clustering Algorithm and Support Vector Machine
Masoumeh Rezaei mansoureh rezaei Masoud Rezaei

10.52547/jist.16507.11.41.65

20.1001.1.23221437.2023.11.41.3.8

Foreground-background image segmentation has been an important research problem. It is one of the main tasks in the field of computer vision whose purpose is detecting variations in image sequences. It provides candidate objects for further attentional selection, e.g., More

Foreground-background image segmentation has been an important research problem. It is one of the main tasks in the field of computer vision whose purpose is detecting variations in image sequences. It provides candidate objects for further attentional selection, e.g., in video surveillance. In this paper, we introduce an automatic and efficient Foreground-background segmentation. The proposed method starts with the detection of visually salient image regions with a saliency map that uses Fourier transform and a Gaussian filter. Then, each point in the maps classifies as salient or non-salient using a binary threshold. Next, a hole filling operator is applied for filling holes in the achieved image, and the area-opening method is used for removing small objects from the image. For better separation of the foreground and background, dilation and erosion operators are also used. Erosion and dilation operators are applied for shrinking and expanding the achieved region. Afterward, the foreground and background samples are achieved. Because the number of these data is large, K-means clustering is used as a sampling technique to restrict computational efforts in the region of interest. K cluster centers for each region are set for training of Support Vector Machine (SVM). SVM, as a powerful binary classifier, is used to segment the interest area from the background. The proposed method is applied on a benchmark dataset consisting of 1000 images and experimental results demonstrate the supremacy of the proposed method to some other foreground-background segmentation methods in terms of ER, VI, GCE, and PRI. Manuscript profile

Open Access Article

14 - Dynamic Tree- Based Routing: Applied in Wireless Sensor Network and IOT
Mehdi Khazaei

10.52547/jist.22504.10.39.191

20.1001.1.23221437.2022.10.39.4.8

The Internet of Things (IOT) has advanced in parallel with the wireless sensor network (WSN) and the WSN is an IOT empowerment. The IOT, through the internet provides the connection between the defined objects in apprehending and supervising the environment. In some app More

The Internet of Things (IOT) has advanced in parallel with the wireless sensor network (WSN) and the WSN is an IOT empowerment. The IOT, through the internet provides the connection between the defined objects in apprehending and supervising the environment. In some applications, the IOT is converted into WSN with the same descriptions and limitations. Working with WSN is limited to energy, memory and computational ability of the sensor nodes. This makes the energy consumption to be wise if protection of network reliability is sought. The newly developed and effective hierarchical and clustering techniques are to overcome these limitations. The method proposed in this article, regarding energy consumption reduction is tree-based hierarchical technique, used clustering based on dynamic structure. In this method, the location-based and time-based properties of the sensor nodes are applied leading to provision of a greedy method as to form the subtree leaves. The rest of the tree structure up to the root, would be formed by applying the centrality concept in the network theory by the base station. The simulation reveals that the scalability and fairness parameter in energy consumption compare to the similar method has improved, thus, prolonged network lifetime and reliability. Manuscript profile

Open Access Article

15 - Proposing an FCM-MCOA Clustering Approach Stacked with Convolutional Neural Networks for Analysis of Customers in Insurance Company
Motahareh Ghavidel meisam Yadollahzadeh tabari Mehdi Golsorkhtabaramiri

10.61186/jist.41465.12.45.62

To create a customer-based marketing strategy, it is necessary to perform a proper analysis of customer data so that customers can be separated from each other or predict their future behavior. The datasets related to customers in any business usually are high-dimension More

To create a customer-based marketing strategy, it is necessary to perform a proper analysis of customer data so that customers can be separated from each other or predict their future behavior. The datasets related to customers in any business usually are high-dimensional with too many instances and include both supervised and unsupervised ones. For this reason, companies today are trying to satisfy their customers as much as possible. This issue requires careful consideration of customers from several aspects. Data mining algorithms are one of the practical methods in businesses to find the required knowledge from customer’s both demographic and behavioral. This paper presents a hybrid clustering algorithm using the Fuzzy C-Means (FCM) method and the Modified Cuckoo Optimization Algorithm (MCOA). Since customer data analysis has a key role in ensuring a company's profitability, The Insurance Company (TIC) dataset is utilized for the experiments and performance evaluation. We compare the convergence of the proposed FCM-MCOA approach with some conventional optimization methods, such as Genetic Algorithm (GA) and Invasive Weed Optimization (IWO). Moreover, we suggest a customer classifier using the Convolutional Neural Networks (CNNs). Simulation results reveal that the FCM-MCOA converges faster than conventional clustering methods. In addition, the results indicate that the accuracy of the CNN-based classifier is more than 98%. CNN-based classifier converges after some couples of iterations, which shows a fast convergence in comparison with the conventional classifiers, such as Decision Tree (DT), Support Vector Machine (SVM), K-Nearest Neighborhood (KNN), and Naive Bayes (NB) classifiers. Manuscript profile

Current Issue

Published Issues

Menu

Browse

List of Articles Clustering