List of subject articles Natural Language Processing


    • Open Access Article

      1 - An Improved Sentiment Analysis Algorithm Based on Appraisal Theory and Fuzzy Logic
      Azadeh  Roustakiani Neda Abdolvand سعیده رجائی هرندی
      Millions of comments and opinions are posted daily on websites such as Twitter or Facebook. Users share their opinions on various topics. People need to know the opinions of other people in order to purchase consciously. Businesses also need customers’ opinions and big Full Text
      Millions of comments and opinions are posted daily on websites such as Twitter or Facebook. Users share their opinions on various topics. People need to know the opinions of other people in order to purchase consciously. Businesses also need customers’ opinions and big data analysis to continue serving customer-friendly services, manage customer complaints and suggestions, increase financial benefits, evaluate products, as well as for marketing and business development. With the development of social media, the importance of sentiment analysis has increased, and sentiment analysis has become a very popular topic among computer scientists and researchers, because it has many usages in market and customer feedback analysis. Most sentiment analysis methods suffice to split comments into three negative, positive and neutral categories. But Appraisal Theory considers other characteristics of opinion such as attitude, graduation and orientation which results in more precise analysis. Therefore, this research has proposed an algorithm that increases the accuracy of the sentiment analysis algorithms by combining appraisal theory and fuzzy logic. This algorithm was tested on Stanford data (25,000 comments on the film) and compared with a reliable dictionary. Finally, the algorithm reached the accuracy of 95%. The results of this research can help to manage customer complaints and suggestions, marketing and business development, and product testing. Manuscript Document
    • Open Access Article

      2 - Farsi Conceptual Text Summarizer: A New Model in Continuous Vector Space
      Mohammad Ebrahim Khademi Mohammad Fakhredanesh Seyed Mojtaba Hoseini
      Traditional methods of summarization were very costly and time-consuming. This led to the emergence of automatic methods for text summarization. Extractive summarization is an automatic method for generating summary by identifying the most important sentences of a text. Full Text
      Traditional methods of summarization were very costly and time-consuming. This led to the emergence of automatic methods for text summarization. Extractive summarization is an automatic method for generating summary by identifying the most important sentences of a text. In this paper, two innovative approaches are presented for summarizing the Persian texts. In these methods, using a combination of deep learning and statistical methods, we cluster the concepts of the text and, based on the importance of the concepts in each sentence, we derive the sentences that have the most conceptual burden. In the first unsupervised method, without using any hand-crafted features, we achieved state-of-the-art results on the Pasokh single-document corpus as compared to the best supervised Persian methods. In order to have a better understanding of the results, we have evaluated the human summaries generated by the contributing authors of the Pasokh corpus as a measure of the success rate of the proposed methods. In terms of recall, these have achieved favorable results. In the second method, by giving the coefficient of title effect and its increase, the average ROUGE-2 values increased to 0.4% on the Pasokh single-document corpus compared to the first method and the average ROUGE-1 values increased to 3% on the Khabir news corpus. Manuscript Document
    • Open Access Article

      3 - SGF (Semantic Graphs Fusion): A Knowledge-based Representation of Textual Resources for Text Mining Applications
      Morteza Jaderyan Hassan Khotanlou
      The proper representation of textual documents has been the greatest challenge in text mining applications. In this paper, a knowledge-based representation model for text analysis applications is introduced. The proposed functionalities of the system are achieved by int Full Text
      The proper representation of textual documents has been the greatest challenge in text mining applications. In this paper, a knowledge-based representation model for text analysis applications is introduced. The proposed functionalities of the system are achieved by integrating structured knowledge in the core components of the system. The semantic, lexical, syntactical and structural features are identified by the pre-processing module. The enrichment module is introduced to identify contextually similar concepts and concept maps for improving the representation. The information content of documents and the enriched contents are then fused (merged) into the graphical structure of a semantic network to form a unified and comprehensive representation of documents. The 20Newsgroup and Reuters-21578 datasets are used for evaluation. The evaluation results suggest that the proposed method exhibits a high level of accuracy, recall and precision. The results also indicate that even when a small portion of the information content is available, the proposed method performs well in standard text mining applications Manuscript Document
    • Open Access Article

      4 - DeepSumm: A Novel Deep Learning-Based Multi-Lingual Multi-Documents Summarization System
      Shima Mehrabi Seyed Abolghassem Mirroshandel Hamidreza  Ahmadifar
      With the increasing amount of accessible textual information via the internet, it seems necessary to have a summarization system that can generate a summary of information for user demands. Since a long time ago, summarization has been considered by natural language pro Full Text
      With the increasing amount of accessible textual information via the internet, it seems necessary to have a summarization system that can generate a summary of information for user demands. Since a long time ago, summarization has been considered by natural language processing researchers. Today, with improvement in processing power and the development of computational tools, efforts to improve the performance of the summarization system is continued, especially with utilizing more powerful learning algorithms such as deep learning method. In this paper, a novel multi-lingual multi-document summarization system is proposed that works based on deep learning techniques, and it is amongst the first Persian summarization system by use of deep learning. The proposed system ranks the sentences based on some predefined features and by using a deep artificial neural network. A comprehensive study about the effect of different features was also done to achieve the best possible features combination. The performance of the proposed system is evaluated on the standard baseline datasets in Persian and English. The result of evaluations demonstrates the effectiveness and success of the proposed summarization system in both languages. It can be said that the proposed method has achieve the state of the art performance in Persian and English. Manuscript Document
    • Open Access Article

      5 - Recognizing Transliterated English Words in Persian Texts
      Ali Hoseinmardy Saeedeh Momtazi
      One of the most important problems of text processing systems is the word mismatch problem. This results in limited access to the required information in information retrieval. This problem occurs in analyzing textual data such as news, or low accuracy in text classific Full Text
      One of the most important problems of text processing systems is the word mismatch problem. This results in limited access to the required information in information retrieval. This problem occurs in analyzing textual data such as news, or low accuracy in text classification and clustering. In this case, if the text-processing engine does not use similar/related words in the same sense, it may not be able to guide you to the appropriate result. Various statistical techniques have been proposed to bridge the vocabulary gap problem; e.g., if two words are used in similar contexts frequently, they have similar/related meanings. Synonym and similar words, however, are only one of the categories of related words that are expected to be captured by statistical approaches. Another category of related words is the pair of an original word in one language and its transliteration from another language. This kind of related words is common in non-English languages. In non-English texts, instead of using the original word from the target language, the writer may borrow the English word and only transliterate it to the target language. Since this kind of writing style is used in limited texts, the frequency of transliterated words is not as high as original words. As a result, available corpus-based techniques are not able to capture their concept. In this article, we propose two different approaches to overcome this problem: (1) using neural network-based transliteration, (2) using available tools that are used for machine translation/transliteration, such as Google Translate and Behnevis. Our experiments on a dataset, which is provided for this purpose, shows that the combination of the two approaches can detect English words with 89.39% accuracy. Manuscript Document
    • Open Access Article

      6 - Utilizing Gated Recurrent Units to Retain Long Term Dependencies with Recurrent Neural Network in Text Classification
      Nidhi Chandra Laxmi  Ahuja Sunil Kumar Khatri Himanshu Monga
      The classification of text is one of the key areas of research for natural language processing. Most of the organizations get customer reviews and feedbacks for their products for which they want quick reviews to action on them. Manual reviews would take a lot of time a Full Text
      The classification of text is one of the key areas of research for natural language processing. Most of the organizations get customer reviews and feedbacks for their products for which they want quick reviews to action on them. Manual reviews would take a lot of time and effort and may impact their product sales, so to make it quick these organizations have asked their IT to leverage machine learning algorithms to process such text on a real-time basis. Gated recurrent units (GRUs) algorithms which is an extension of the Recurrent Neural Network and referred to as gating mechanism in the network helps provides such mechanism. Recurrent Neural Networks (RNN) has demonstrated to be the main alternative to deal with sequence classification and have demonstrated satisfactory to keep up the information from past outcomes and influence those outcomes for performance adjustment. The GRU model helps in rectifying gradient problems which can help benefit multiple use cases by making this model learn long-term dependencies in text data structures. A few of the use cases that follow are – sentiment analysis for NLP. GRU with RNN is being used as it would need to retain long-term dependencies. This paper presents a text classification technique using a sequential word embedding processed using gated recurrent unit sigmoid function in a Recurrent neural network. This paper focuses on classifying text using the Gated Recurrent Units method that makes use of the framework for embedding fixed size, matrix text. It helps specifically inform the network of long-term dependencies. We leveraged the GRU model on the movie review dataset with a classification accuracy of 87%. Manuscript Document