Opinion Mining in Persian Language Using Supervised AlgorithmsResearch Areas : Natural Language Processing
abdollah aghaei 2 (K.N. Toosi University of Technology)
Keywords: opinion mining, Persian, supervised algorithm, SentiWordNet, , , , ,
Rapid growth of Internet results in large amount of user-generated contents in social media, forums, blogs, and etc. Automatic analysis of this content is needed to extract valuable information from these contents. Opinion mining is a process of analyzing opinions, sentiments and emotions to recognize people’s preferences about different subjects. One of the main tasks of opinion mining is classifying a text document into positive or negative classes. Most of the researches in this field applied opinion mining for English language. Although Persian language is spoken in different countries, but there are few studies for opinion mining in Persian language. In this article, a comprehensive study of opinion mining for Persian language is conducted to examine performance of opinion mining in different conditions. First we create a Persian SentiWordNet using Persian WordNet. Then this lexicon is used to weight features. Results of applying three machine learning algorithms Support vector machine (SVM), naive Bayes (NB) and logistic regression are compared before and after weighting by lexicon. Experiments show support vector machine and logistic regression achieve better results in most cases and applying SO (semantic orientation) improves the accuracy of logistic regression. Increasing number of instances and using unbalanced dataset has a positive effect on the performance of opinion mining. Generally this research provides better results comparing to other researches in opinion mining of Persian language.