• Home
  • Natural Language Processing
  • OpenAccess
    • List of Articles Natural Language Processing

      • Open Access Article

        1 - Computing Semantic Similarity of Documents Based on Semantic Tensors
        Navid Bahrami Amir H.  Jadidinejad Mojdeh Nazari
        Exploiting semantic content of texts due to its wide range of applications such as finding related documents to a query, document classification and computing semantic similarity of documents has always been an important and challenging issue in Natural Language Process More
        Exploiting semantic content of texts due to its wide range of applications such as finding related documents to a query, document classification and computing semantic similarity of documents has always been an important and challenging issue in Natural Language Processing. In this paper, using Wikipedia corpus and organizing it by three-dimensional tensor structure, a novel corpus-based approach for computing semantic similarity of texts is proposed. For this purpose, first the semantic vector of available words in documents are obtained from the vector space derived from available words in Wikipedia articles, then the semantic vector of documents is formed according to their words vector. Consequently, measuring the semantic similarity of documents can be done by comparing their semantic vectors. The vector space of the corpus of Wikipedia will cause the curse of dimensionality challenge because of the existence of the high-dimension vectors. Usually vectors in high-dimension space are very similar to each other; in this way, it would be meaningless and vain to identify the most appropriate semantic vector for the words. Therefore, the proposed approach tries to improve the effect of the curse of dimensionality by reducing the vector space dimensions through random indexing. Moreover, the random indexing makes significant improvement in memory consumption of the proposed approach by reducing the vector space dimensions. The addressing capability of synonymous and polysemous words in the proposed approach will be feasible by means of the structured co-occurrence through random indexing. Manuscript profile
      • Open Access Article

        2 - DeepSumm: A Novel Deep Learning-Based Multi-Lingual Multi-Documents Summarization System
        Shima Mehrabi Seyed Abolghassem Mirroshandel Hamidreza  Ahmadifar
        With the increasing amount of accessible textual information via the internet, it seems necessary to have a summarization system that can generate a summary of information for user demands. Since a long time ago, summarization has been considered by natural language pro More
        With the increasing amount of accessible textual information via the internet, it seems necessary to have a summarization system that can generate a summary of information for user demands. Since a long time ago, summarization has been considered by natural language processing researchers. Today, with improvement in processing power and the development of computational tools, efforts to improve the performance of the summarization system is continued, especially with utilizing more powerful learning algorithms such as deep learning method. In this paper, a novel multi-lingual multi-document summarization system is proposed that works based on deep learning techniques, and it is amongst the first Persian summarization system by use of deep learning. The proposed system ranks the sentences based on some predefined features and by using a deep artificial neural network. A comprehensive study about the effect of different features was also done to achieve the best possible features combination. The performance of the proposed system is evaluated on the standard baseline datasets in Persian and English. The result of evaluations demonstrates the effectiveness and success of the proposed summarization system in both languages. It can be said that the proposed method has achieve the state of the art performance in Persian and English. Manuscript profile
      • Open Access Article

        3 - A Customized Web Spider for Why-QA Pairs Corpus Preparation
        Manvi  Breja
        Considering the growth of researches on improving the performance of non-factoid question answering system, there is a need of an open-domain non-factoid dataset. There are some datasets available for non-factoid and even how-type questions but no appropriate dataset av More
        Considering the growth of researches on improving the performance of non-factoid question answering system, there is a need of an open-domain non-factoid dataset. There are some datasets available for non-factoid and even how-type questions but no appropriate dataset available which comprises only open-domain why-type questions that can cover all range of questions format. Why-questions play a significant role and are usually asked in every domain. They are more complex and difficult to get automatically answered by the system as why-questions seek reasoning for the task involved. They are prevalent and asked in curiosity by real users and thus their answering depends on the users’ need, knowledge, context and their experience. The paper develops a customized web crawler for gathering a set of why-questions from five popular question answering websites viz. Answers.com, Yahoo! Answers, Suzan Verberne’s open-source dataset, Quora and Ask.com available on Web irrespective of any domain. Along with the questions, their category, document title and appropriate answer candidates are also maintained in the dataset. With this, distribution of why-questions according to their type and category are illustrated. To the best of our knowledge, it is the first large enough dataset of 2000 open-domain why-questions with their relevant answers that will further help in stimulating researches focusing to improve the performance of non-factoid type why-QAS. Manuscript profile