• OpenAccess
    • List of Articles Corpus

      • Open Access Article

        1 - A Corpus for Evaluation of Cross Language Text Re-use Detection Systems
        Salar Mohtaj Habibollah Asghari
        In recent years, the availability of documents through the Internet along with automatic translation systems have increased plagiarism, especially across languages. Cross-lingual plagiarism occurs when the source or original text is in one language and the plagiarized o More
        In recent years, the availability of documents through the Internet along with automatic translation systems have increased plagiarism, especially across languages. Cross-lingual plagiarism occurs when the source or original text is in one language and the plagiarized or re-used text is in another language. Various methods for automatic text re-use detection across languages have been developed whose objective is to assist human experts in analyzing documents for plagiarism cases. For evaluating the performance of these systems and algorithms, standard evaluation resources are needed. To construct cross lingual plagiarism detection corpora, the majority of earlier studies have paid attention to English and other European language pairs, and have less focused on low resource languages. In this paper, we investigate a method for constructing an English-Persian cross-language plagiarism detection corpus based on parallel bilingual sentences that artificially generate passages with various degrees of paraphrasing. The plagiarized passages are inserted into topically related English and Persian Wikipedia articles in order to have more realistic text documents. The proposed approach can be applied to other less-resourced languages. In order to evaluate the compiled corpus, both intrinsic and extrinsic evaluation methods were employed. So, the compiled corpus can be suitably included into an evaluation framework for assessing cross-language plagiarism detection systems. Our proposed corpus is free and publicly available for research purposes. Manuscript profile
      • Open Access Article

        2 - Designing a Semi-Intelligent Crawler for Creating a Persian Question Answering Corpus Called Popfa
        Hadi Sharifian Nasim Tohidi Chitra Dadkhah
        Question answering in natural language processing is an interesting field for researchers to examine their ability in solving the tough Alan Turing test. Every day computer scientists are trying hard to develop and promote question answering systems in various natural l More
        Question answering in natural language processing is an interesting field for researchers to examine their ability in solving the tough Alan Turing test. Every day computer scientists are trying hard to develop and promote question answering systems in various natural languages, especially English. However, in Persian, it is not easy to advance these systems. The main problem is related to low resources and not enough corpora in this language. Thus, in this paper, a Persian question answering text corpus is created, which covers a wide range of religious, midwifery, and issues related to youth marriage topics and question types commonly encountered in Persian language usage. In this regard, the most important challenge was introducing a method for data gathering in Persian as well as facilitating and expanding the data gathering process. Though, SIC (Semi-Intelligent Crawler) is proposed as a solution that can overcome the challenge and find a way to crawl the Persian websites, gather text and finally import it to a database. The outcome of this research is a corpus called Popfa, which stands for POrsesh Pasokh (question answering) in FArsi. This corpus contains more than 53,000 standard questions and answers. Besides, it has been evaluated with standard approaches. All the questions in Popfa are answered by specialists in two general topics: religious and medical questions. Therefore, researchers can now use this corpus for doing research on Persian question answering. Manuscript profile
      • Open Access Article

        3 - Whispered Speech Emotion Recognition with Gender Detection using BiLSTM and DCNN
        Aniruddha Mohanty Ravindranath C. Cherukuri
        Emotions are human mental states at a particular instance in time concerning one’s circumstances, mood, and relationships with others. Identifying emotions from the whispered speech is complicated as the conversation might be confidential. The representation of the spee More
        Emotions are human mental states at a particular instance in time concerning one’s circumstances, mood, and relationships with others. Identifying emotions from the whispered speech is complicated as the conversation might be confidential. The representation of the speech relies on the magnitude of its information. Whispered speech is intelligible, a low-intensity signal, and varies from normal speech. Emotion identification is quite tricky from whispered speech. Both prosodic and spectral speech features help to identify emotions. The emotion identification in a whispered speech happens using prosodic speech features such as zero-crossing rate (ZCR), pitch, and spectral features that include spectral centroid, chroma STFT, Mel scale spectrogram, Mel-frequency cepstral coefficient (MFCC), Shifted Delta Cepstrum (SDC), and Spectral Flux. There are two parts to the proposed implementation. Bidirectional Long Short-Term Memory (BiLSTM) helps to identify the gender from the speech sample in the first step with SDC and pitch. The Deep Convolutional Neural Network (DCNN) model helps to identify the emotions in the second step. This implementation is evaluated with the help of wTIMIT data corpus and gives 98.54% accuracy. Emotions have a dynamic effect on genders, so this implementation performs better than traditional approaches. This approach helps to design online learning management systems, different applications for mobile devices, checking cyber-criminal activities, emotion detection for older people, automatic speaker identification and authentication, forensics, and surveillance. Manuscript profile