Product and service reviews play an important role in making purchase decisions. In current times, when we are faced with many choices, the opinion-based reviews help us narrow down the options and make decisions based on our needs. This is especially true online, where the reviews are easily accessible. Some companies where review-based decisions are very prominent are Amazon, TripAdviser, Yelp, and AirBnB, to name a few. From a business point of view, positive reviews can result in significant financial benefits. This also provides opportunities for deception, where fake reviews can be generated to garner positive opinion about a product, or to disrepute some business. To ensure credibility of the reviews posted on a platform, it is important to use a strong detecting model. In this post, we’ll talk about some methods for detecting fake reviews. The models discussed here fall into three categories: Textbased, Sentimentbased, and Userbased.
This approach to classify fake and non-fake reviews is very similar to the ideas used in spam—classification. By creating the linguistic n-gram features and using a supervised learning algorithm such as Naive Bayes or SVM, one can construct the classification model. This approach, of course, relies on the assumption that the fake and non-fake reviews consist of words with significantly different frequencies. In case the spammers had a little knowledge of the product, or they didn’t have a genuine interest in writing the reviews (for example, the cheaply paid spammers), there are more chances of them creating reviews linguistically different from the non-fake ones.
We don’t have any reason to believe that the spammer won’t be careful enough to create reviews linguistically similar to the genuine ones, or have strong inclinations to write fake opinions. In that case, the pure text-based models won’t be successful. We will need to incorporate more information.
Because the fake reviews are created to enhance the positive opinion or tarnish the image, these reviews should have a strong positive or negative sentiment. Therefore, sentiment analysis of the reviews can be an important tool to separate the spam reviews. Though more sophisticated sentiment analysis methods can be employed, the static AFINN model should give high accuracy as it contains the sentiment scores for the terms, which project very high and low sentiment, and such words are going to be very prominent in the fake reviews. Some of these words include ‘nice’, ‘great’, ‘awesome’, ‘amazing’, ‘bad’, ‘awful’, ‘helpful’, ‘shitty’, and so on. In AFINN model, the authors have computed sentiment scores for a list of words. Compute the sentiment of a review based on the sentiment scores of the terms in the review. The sentiment of the review is defined to be equal to the sum of the sentiment scores for each term in the review. The AFINN is a list of English words rated for valence with an integer value between -5 and 5. The words have been manually labelled by Finn Arup Neilsen in 2009-2010.
If the spammer made an attempt to sound convincing by using words that have high positive or negative sentiment, this model can be very successful.
Even if the numeric rating of the spammer does not deviate much from the general consensus, the text reviews are going to be overwhelmingly positive. In such cases, sentiment scores of the reviews can shed light on the problem of detecting fake reviews, for example, you can compute the sentiment scores of all the 5-star reviews and see if some reviews have extremely high sentiment scores.
The user-based model asserts that a spamming user displays an abnormal behavior, and it is possible to classify users as spammers and non-spammers. The user information can be extracted from their public profiles. The relevant features include:
A standard learning algorithm,such as SVM or Random Forests, on these features can create a classification model for fake reviewers and non-fake reviewers.
Other than these important features, there are some other features that can be extracted from the user’s profile, which can be used in detecting fake reviews.
In this post, we discussed three different categories of models for detecting online fake reviews. Though the basic text-mining approach should detect spam reviews with reasonable accuracy, a smart spammer can make it harder for this model to classify. The sentiment-based model and user behavior can help achieve better accuracy in filtering false opinions. We propose that a combination of these models can be very effective in detecting fake reviews.
[1] Paul Graham “A plan for spam”
[2] More information and a link to download the AFINN wordlist is available here
[3] Finn Arup Nielsen Evaluation of a word list for sentiment analysis in microblogs Proceedings of the ESWC2011 Workshop on “Making Sense of Microposts: Big things come in small packages” 718 in CEUR Workshop Proceedings – 93-98. 2011 May
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…