List of subject articles Image Processing


    • Open Access Article

      1 - Eye Gaze Detection Based on Learning Automata by Using SURF Descriptor
      Hassan Farsi Reza Nasiripour Sajad Mohammadzadeh
      In the last decade, eye gaze detection system is one of the most important areas in image processing and computer vision. The performance of eye gaze detection system depends on iris detection and recognition (IR). Iris recognition is very important role for person iden Full Text
      In the last decade, eye gaze detection system is one of the most important areas in image processing and computer vision. The performance of eye gaze detection system depends on iris detection and recognition (IR). Iris recognition is very important role for person identification. The aim of this paper is to achieve higher recognition rate compared to learning automata based methods. Usually, iris retrieval based systems consist of several parts as follows: pre-processing, iris detection, normalization, feature extraction and classification which are captured from eye region. In this paper, a new method without normalization step is proposed. Meanwhile, Speeded up Robust Features (SURF) descriptor is used to extract features of iris images. The descriptor of each iris image creates a vector with 64 dimensions. For classification step, learning automata classifier is applied. The proposed method is tested on three known iris databases; UBIRIS, MMU and UPOL database. The proposed method results in recognition rate of 100% for UBIRIS and UPOL databases and 99.86% for MMU iris database. Also, EER rate of the proposed method for UBIRIS, UPOL and MMU iris database are 0.00%, 0.00% and 0.008%, respectively. Experimental results show that the proposed learning automata classifier results in minimum classification error, and improves precision and computation time. Manuscript Document
    • Open Access Article

      2 - Improvement in Accuracy and Speed of Image Semantic Segmentation via Convolution Neural Network Encoder-Decoder
      هانیه زمانیان Hassan Farsi Sajad Mohammadzadeh
      Recent researches on pixel-wise semantic segmentation use deep neural networks to improve accuracy and speed of these networks in order to increase the efficiency in practical applications such as automatic driving. These approaches have used deep architecture to predic Full Text
      Recent researches on pixel-wise semantic segmentation use deep neural networks to improve accuracy and speed of these networks in order to increase the efficiency in practical applications such as automatic driving. These approaches have used deep architecture to predict pixel tags, but the obtained results seem to be undesirable. The reason for these unacceptable results is mainly due to the existence of max pooling operators, which reduces the resolution of the feature maps. In this paper, we present a convolutional neural network composed of encoder-decoder segments based on successful SegNet network. The encoder section has a depth of 2, which in the first part has 5 convolutional layers, in which each layer has 64 filters with dimensions of 3×3. In the decoding section, the dimensions of the decoding filters are adjusted according to the convolutions used at each step of the encoding. So, at each step, 64 filters with the size of 3×3 are used for coding where the weights of these filters are adjusted by network training and adapted to the educational data. Due to having the low depth of 2, and the low number of parameters in proposed network, the speed and the accuracy improve compared to the popular networks such as SegNet and DeepLab. For the CamVid dataset, after a total of 60,000 iterations, we obtain the 91% for global accuracy, which indicates improvements in the efficiency of proposed method. Manuscript Document
    • Open Access Article

      3 - A Novel Method for Image Encryption Using Modified Logistic Map
      ardalan Ghasemzadeh Omid R.B.  Speily
      With the development of the internet and social networks, the interest on multimedia data, especially digital images, has been increased among scientists. Due to their advantages such as high speed as well as high security and complexity, chaotic functions have been ext Full Text
      With the development of the internet and social networks, the interest on multimedia data, especially digital images, has been increased among scientists. Due to their advantages such as high speed as well as high security and complexity, chaotic functions have been extensively employed in images encryption. In this paper, a modified logistic map function was proposed, which resulted in higher scattering in obtained results. Confusion and diffusion functions, as the two main actions in cryptography, are not necessarily performed respectively, i.e. each of these two functions can be applied on the image in any order, provided that the sum of total functions does not exceed 10. In calculation of sum of functions, confusion has the coefficient of 1 and diffusion has the coefficient of 2. To simulate this method, a binary stack is used. Application of binary stack and pseudo-random numbers obtained from the modified chaotic function increased the complexity of the proposed encryption algorithm. The security key length, entropy value, NPCR and UICA values and correlation coefficient analysis results demonstrate the feasibility and validity of the proposed method. Analyzing the obtained results and comparing the algorithm to other investigated methods clearly verified high efficiency of proposed method. Manuscript Document
    • Open Access Article

      4 - Retinal Vessel Extraction Using Dynamic Threshold And Enhancement Image Filter From Retina Fundus
      erwin erwin Tomi Kiyatmoko
      In the diagnosis of retinal disease, Retinal vessels become an important role in determining certain diseases. Retina vessels are an important element with a variety of shapes and sizes, each human blood vessel also can determine the disease with various types, but the Full Text
      In the diagnosis of retinal disease, Retinal vessels become an important role in determining certain diseases. Retina vessels are an important element with a variety of shapes and sizes, each human blood vessel also can determine the disease with various types, but the feasibility of the pattern of retinal blood vessels is very important for the advanced diagnosis process in medical retina such as detection, identification and classification. Improvement and improvement of image quality in this case is very important by focusing on extracting or segmenting the retinal veins so that parameters such as accuracy, specifications, and sensitivity can be obtained that are better and meet the advanced system. Therefore we conducted experiments in order to develop extraction of retinal images to obtain binary images of retinal vessels in the medical world using Dynamic Threshold and Butterworth Bandpass Filter. Using a database DRIVE Accuracy of 94.77%, sensitivity of 54.48% and specificity of 98.71%. Manuscript Document
    • Open Access Article

      5 - Body Field: Structured Mean Field with Human Body Skeleton Model and Shifted Gaussian Edge Potentials
      Sara Ershadi-Nasab Shohreh Kasaei Esmaeil Sanaei Erfan Noury Hassan Hafez-kolahi
      An efficient method for simultaneous human body part segmentation and pose estimation is introduced. A conditional random field with a fully-connected graphical model is used. Possible node (image pixel) labels comprise of the human body parts and the background. In the Full Text
      An efficient method for simultaneous human body part segmentation and pose estimation is introduced. A conditional random field with a fully-connected graphical model is used. Possible node (image pixel) labels comprise of the human body parts and the background. In the human body skeleton model, the spatial dependencies among body parts are encoded in the definition of pairwise energy functions according to the conditional random fields. Proper pairwise edge potentials between image pixels are defined according to the presence or absence of human body parts that are near to each other. Various Gaussian kernels in position, color, and histogram of oriented gradients spaces are used for defining the pairwise energy terms. Shifted Gaussian kernels are defined between each two body parts that are connected to each other according to the human body skeleton model. As shifted Gaussian kernels impose a high computational cost to the inference, an efficient inference process is proposed by a mean field approximation method that uses high dimensional shifted Gaussian filtering. The experimental results evaluated on the challenging KTH Football, Leeds Sports Pose, HumanEva, and Penn-Fudan datasets show that the proposed method increases the per-pixel accuracy measure for human body part segmentation and also improves the probability of correct parts metric of human body joint locations. Manuscript Document
    • Open Access Article

      6 - A Two-Stage Multi-Objective Enhancement for Fused Magnetic Resonance Image and Computed Tomography Brain Images
      Leena Chandrashekar A Sreedevi Asundi
      Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are the imaging techniques for detection of Glioblastoma. However, a single imaging modality is never adequate to validate the presence of the tumor. Moreover, each of the imaging techniques represents a diff Full Text
      Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are the imaging techniques for detection of Glioblastoma. However, a single imaging modality is never adequate to validate the presence of the tumor. Moreover, each of the imaging techniques represents a different characteristic of the brain. Therefore, experts have to analyze each of the images independently. This requires more expertise by doctors and delays the detection and diagnosis time. Multimodal Image Fusion is a process of generating image of high visual quality, by fusing different images. However, it introduces blocking effect, noise and artifacts in the fused image. Most of the enhancement techniques deal with contrast enhancement, however enhancing the image quality in terms of edges, entropy, peak signal to noise ratio is also significant. Contrast Limited Adaptive Histogram Equalization (CLAHE) is a widely used enhancement technique. The major drawback of the technique is that it only enhances the pixel intensities and also requires selection of operational parameters like clip limit, block size and distribution function. Particle Swarm Optimization (PSO) is an optimization technique used to choose the CLAHE parameters, based on a multi objective fitness function representing entropy and edge information of the image. The proposed technique provides improvement in visual quality of the Laplacian Pyramid fused MRI and CT images. Manuscript Document
    • Open Access Article

      7 - Drone Detection by Neural Network Using GLCM and SURF Features
      Tanzia  Ahmed Tanvir  Rahman Bir  Ballav Roy Jia Uddin
      This paper presents a vision-based drone detection method. There are a number of researches on object detection which includes different feature extraction methods – all of those are used distinctly for the experiments. But in the proposed model, a hybrid feature extrac Full Text
      This paper presents a vision-based drone detection method. There are a number of researches on object detection which includes different feature extraction methods – all of those are used distinctly for the experiments. But in the proposed model, a hybrid feature extraction method using SURF and GLCM is used to detect object by Neural Network which has never been experimented before. Both are very popular ways of feature extraction. Speeded-up Robust Feature (SURF) is a blob detection algorithm which extracts the points of interest from an integral image, thus converts the image into a 2D vector. The Gray-Level Co-Occurrence Matrix (GLCM) calculates the number of occurrences of consecutive pixels in same spatial relationship and represents it in a new vector- 8 × 8 matrix of best possible attributes of an image. SURF is a popular method of feature extraction and fast matching of images, whereas, GLCM method extracts the best attributes of the images. In the proposed model, the images were processed first to fit our feature extraction methods, then the SURF method was implemented to extract the features from those images into a 2D vector. Then for our next step GLCM was implemented which extracted the best possible features out of the previous vector, into a 8 × 8 matrix. Thus, image is processed in to a 2D vector and feature extracted from the combination of both SURF and GLCM methods ensures the quality of the training dataset by not just extracting features faster (with SURF) but also extracting the best of the point of interests (with GLCM). The extracted featured related to the pattern are used in the neural network for training and testing. Pattern recognition algorithm has been used as a machine learning tool for the training and testing of the model. In the experimental evaluation, the performance of proposed model is examined by cross entropy for each instance and percentage error. For the tested drone dataset, experimental results demonstrate improved performance over the state-of-art models by exhibiting less cross entropy and percentage error. Manuscript Document
    • Open Access Article

      8 - Human Activity Recognition based on Deep Belief Network Classifier and Combination of Local and Global Features
      Azar Mahmoodzadeh
      During the past decades, recognition of human activities has attracted the attention of numerous researches due to its outstanding applications including smart houses, health-care and monitoring the private and public places. Applying to the video frames, this paper pro Full Text
      During the past decades, recognition of human activities has attracted the attention of numerous researches due to its outstanding applications including smart houses, health-care and monitoring the private and public places. Applying to the video frames, this paper proposes a hybrid method which combines the features extracted from the images using the ‘scale-invariant features transform’ (SIFT), ‘histogram of oriented gradient’ (HOG) and ‘global invariant features transform’ (GIST) descriptors and classifies the activities by means of the deep belief network (DBN). First, in order to avoid ineffective features, a pre-processing course is performed on any image in the dataset. Then, the mentioned descriptors extract several features from the image. Due to the problems of working with a large number of features, a small and distinguishing feature set is produced using the bag of words (BoW) technique. Finally, these reduced features are given to a deep belief network in order to recognize the human activities. Comparing the simulation results of the proposed approach with some other existing methods applied to the standard PASCAL VOC Challenge 2010 database with nine different activities demonstrates an improvement in the accuracy, precision and recall measures (reaching 96.39%, 85.77% and 86.72% respectively) for the approach of this work with respect to the other compared ones in the human activity recognition. Manuscript Document
    • Open Access Article

      9 - Farsi Font Detection using the Adaptive RKEM-SURF Algorithm
      Zahra Hossein-Nejad Hamed Agahi Azar Mahmoodzadeh
      Farsi font detection is considered as the first stage in the Farsi optical character recognition (FOCR) of scanned printed texts. To this aim, this paper proposes an improved version of the speeded-up robust features (SURF) algorithm, as the feature detector in the font Full Text
      Farsi font detection is considered as the first stage in the Farsi optical character recognition (FOCR) of scanned printed texts. To this aim, this paper proposes an improved version of the speeded-up robust features (SURF) algorithm, as the feature detector in the font recognition process. The SURF algorithm suffers from creation of several redundant features during the detection phase. Thus, the presented version employs the redundant keypoint elimination method (RKEM) to enhance the matching performance of the SURF by reducing unnecessary keypoints. Although the performance of the RKEM is acceptable in this task, it exploits a fixed experimental threshold value which has a detrimental impact on the results. In this paper, an Adaptive RKEM is proposed for the SURF algorithm which considers image type and distortion, when adjusting the threshold value. Then, this improved version is applied to recognize Farsi fonts in texts. To do this, the proposed Adaptive RKEM-SURF detects the keypoints and then SURF is used as the descriptor for the features. Finally, the matching process is done using the nearest neighbor distance ratio. The proposed approach is compared with recently published algorithms for FOCR to confirm its superiority. This method has the capability to be generalized to other languages such as Arabic and English. Manuscript Document
    • Open Access Article

      10 - DeepFake Detection using 3D-Xception Net with Discrete Fourier Transformation
      Adeep  Biswas Debayan  Bhattacharya Anil Kumar Kakelli
      The videos are more popular for sharing content on social media to capture the audience’s attention. The artificial manipulation of videos is growing rapidly to make the videos flashy and interesting but they can easily misuse to spread false information on social media Full Text
      The videos are more popular for sharing content on social media to capture the audience’s attention. The artificial manipulation of videos is growing rapidly to make the videos flashy and interesting but they can easily misuse to spread false information on social media platforms. Deep Fake is a problematic method for the manipulation of videos in which artificial components are added to the video using emerging deep learning techniques. Due to the increase in the accuracy of deep fake generation methods, artificially created videos are no longer detectable and pose a major threat to social media users. To address this growing problem, we have proposed a new method for detecting deep fake videos using 3D Inflated Xception Net with Discrete Fourier Transformation. Xception Net was originally designed for application on 2D images only. The proposed method is the first attempt to use a 3D Xception Net for categorizing video-based data. The advantage of the proposed method is, it works on the whole video rather than the subset of frames while categorizing. Our proposed model was tested on the popular dataset Celeb-DF and achieved better accuracy. Manuscript Document
    • Open Access Article

      11 - Diagnosis of Gastric Cancer via Classification of the Tongue Images using Deep Convolutional Networks
      Elham Gholam Seyed Reza Kamel Tabbakh maryam khairabadi
      Gastric cancer is the second most common cancer worldwide, responsible for the death of many people in society. One of the issues regarding this disease is the absence of early and accurate detection. In the medical industry, gastric cancer is diagnosed by conducting nu Full Text
      Gastric cancer is the second most common cancer worldwide, responsible for the death of many people in society. One of the issues regarding this disease is the absence of early and accurate detection. In the medical industry, gastric cancer is diagnosed by conducting numerous tests and imagings, which are costly and time-consuming. Therefore, doctors are seeking a cost-effective and time-efficient alternative. One of the medical solutions is Chinese medicine and diagnosis by observing changes of the tongue. Detecting the disease using tongue appearance and color of various sections of the tongue is one of the key components of traditional Chinese medicine. In this study, a method is presented which can carry out the localization of tongue surface regardless of the different poses of people in images. In fact, if the localization of face components, especially the mouth, is done correctly, the components leading to the biggest distinction in the dataset can be used which is favorable in terms of time and space complexity. Also, since we have the best estimation, the best features can be extracted relative to those components and the best possible accuracy can be achieved in this situation. The extraction of appropriate features in this study is done using deep convolutional neural networks. Finally, we use the random forest algorithm to train the proposed model and evaluate the criteria. Experimental results show that the average classification accuracy has reached approximately 73.78 which demonstrates the superiority of the proposed method compared to other methods. Manuscript Document
    • Open Access Article

      12 - Performance Analysis of Hybrid SOM and AdaBoost Classifiers for Diagnosis of Hypertensive Retinopathy
      Wiharto Wiharto Esti Suryani Murdoko Susilo
      The diagnosis of hypertensive retinopathy (CAD-RH) can be made by observing the tortuosity of the retinal vessels. Tortuosity is a feature that is able to show the characteristics of normal or abnormal blood vessels. This study aims to analyze the performance of the CAD Full Text
      The diagnosis of hypertensive retinopathy (CAD-RH) can be made by observing the tortuosity of the retinal vessels. Tortuosity is a feature that is able to show the characteristics of normal or abnormal blood vessels. This study aims to analyze the performance of the CAD-RH system based on feature extraction tortuosity of retinal blood vessels. This study uses a segmentation method based on clustering self-organizing maps (SOM) combined with feature extraction, feature selection, and the ensemble Adaptive Boosting (AdaBoost) classification algorithm. Feature extraction was performed using fractal analysis with the box-counting method, lacunarity with the gliding box method, and invariant moment. Feature selection is done by using the information gain method, to rank all the features that are produced, furthermore, it is selected by referring to the gain value. The best system performance is generated in the number of clusters 2 with fractal dimension, lacunarity with box size 22-29, and invariant moment M1 and M3. Performance in these conditions is able to provide 84% sensitivity, 88% specificity, 7.0 likelihood ratio positive (LR+), and 86% area under the curve (AUC). This model is also better than a number of ensemble algorithms, such as bagging and random forest. Referring to these results, it can be concluded that the use of this model can be an alternative to CAD-RH, where the resulting performance is in a good category. Manuscript Document