4 min read

Twitter represents a fundamentally new instrument to make social measurements. Millions of people voluntarily express their opinions across any topic imaginable — this data source is incredibly valuable for both research and business. There have been numerous studies on this data for sociological, political, economical, and network analytical questions. We can tap the vast amount of data from Twitter to generate “public opinion” towards certain topics by aggregating the individual tweet results over time.

Sentiment Analysis aims to determine how a certain person or group reacts to a specific topic. Traditionally, we would run surveys to gather data and do statistical analysis. With Twitter, it works by extracting tweets containing references to the desired topic, computing the sentiment polarity and strength of each tweet, and then aggregating the results for all such tweets. Companies use this information to gather public opinion on their products and services, and make data-informed decisions.

We can also track changes in the users’ opinion towards a topic over time, allowing us to identify the events that caused these changes. One of the first studies on Twitter data for sentiment was to study public perception of Obama’s performance as President. Another (fun) example could be the to explore the variation of sentiment regarding the TV series “Game of Thrones.” The unpredictable episode “The Rains of Castamere” resulted in a lot of negative tweets and a peak in the sentiment score.

Also, we can look at the geocoded information in the tweets and analyze the relation between location and mood. For example, people in California may be happy about event X, while New Yorkers didn’t like it much.

Sentiment analysis employs natural language processing (NLP), text mining and computational linguistics to extract subjective information from the textual data.

Applications

Sentiment analysis techniques find applications in technology, finance, and research. Some important applications of sentiment analysis are:

  • predicting stocks
  • computing movie-ratings
  • discerning product satisfaction
  • analyzing political or apolitical campaigns

Techniques

There are broadly two categories of sentiment analysis:

Lexical Methods : These techniques employ dictionaries of words annotated with their semantic polarity and sentiment strength. This is then used to calculate a score for the polarity and/or sentiment of the document. Usually this method gives high precision but low recall.

Machine Learning Methods: Such techniques require creating a model by training the classifier with labeled examples. This means that you must first gather a dataset with examples for positive, negative and neutral classes, extract the features from the examples and then train the algorithm based on the examples. These methods are used mainly for computing the polarity of the document.

The choice of the method heavily depends upon the application, the domain and the language. Using lexicon-based techniques with large dictionaries enables you to achieve very good results. Nevertheless, these techniques require using a lexicon, something which is not always available in all languages.

On the other hand, Machine Learning based techniques can deliver good results, but they require obtaining training on labeled data.

Here are some examples of companies that use sentiment analysis:

AlchemyAPI, based in Denver, is a really cool company that provides resources to do sentiment analysis for an entity on a document or webpage.

The Stock Sonar uses sentiment analysis of unstructured text to determine whether online press is being positive or negative towards businesses by identifying lexical sentiment as well as business events.

About the Author

Janu Verma is a Quantitative Researcher at the Buckler Lab, Cornell University, where he works on problems in bioinformatics and genomics. His background is in mathematics and machine learning and leverages tools from these areas to answer questions in biology. Janu holds a Masters in Theoretical Physics from University of Cambridge in UK, and dropped out from mathematics PhD program (after 3 years) at Kansas State University. He also writes about data science, machine learning and mathematics at Random Inferences.

Until Sunday 24th January you can save 50% on our leading Machine Learning titles as we celebrate Machine Learning week. From Python to Spark, and from R to Java, we’ve got a range of tools and languages covered so you can explore Machine Learning from a range of different perspectives. You can also pick up a free Machine Learning eBook every day this week from our Free Learning page – don’t miss out!

LEAVE A REPLY

Please enter your comment!
Please enter your name here