Computes sentiment analysis of tweets of US States in real-time using Storm.
This repository contains an application which is built to demonstrate as an example of Apache Storm distributed framework by performing sentiment analysis of tweets originating from U.S. in real-time. This Topology retrieves tweets originating from US and computes the sentiment scores of States [based on tweets] continuously i.e. till the time the topology is killed. User has to explicitly kill the topology for exiting the application.
Apache Storm is an open source distributed real-time computation system, developed at BackType by Nathan Marz and team. It has been open sourced by Twitter [post BackType acquisition] in August, 2011. And Storm became a top level project in Apache on 29th September, 2014.
This application has been developed and tested initially with Storm v0.8.2 on Windows 7 in local mode; and was eventually updated and tested with Storm v0.9.3 on 01st January, 2015. Application may or may not work with earlier or later versions than Storm v0.9.3.
This application has been tested in:
You might also be interested in checking out extension of this repo for Twitter sentiment of various states of US using D3.js Choropleth Map and Highcharts Columncharts on StormTweetsSentimentD3Viz and also a similar project for UK Twitter Sentiment on StormTweetsSentimentD3UKViz.
config.properties
and add your own values and complete the integration of Twitter API to your application by looking at your values from Twitter Developer Page.config.properties
afresh and then populate them here without any mistake.config.properties
, as that will be used for getting the reverse geocode location using Latitude and Longitude.AFINN-111.txt
file to see the pre-computed sentiment scores of ~2500 words / phrases.
AFINN-README.txt
and also check his paper.Also, please check pom.xml
for more information on the various other dependencies of the project.
This project uses Apache Maven to build and run the topology.
You need the following on your machine:
Rest of the required frameworks and libraries are downloaded by Maven as required in the build process, the first time the Maven build is invoked.
To build and run this topology, you must use Java 1.8.
Local mode can also be run on Windows environment without installing any specific software or framework as such.
Note: Please be sure to clear your temp folder as it adds lot of temporary files in every run.
In local mode, this application can be run from command line by invoking:
mvn clean compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=org.p7h.storm.sentimentanalysis.topology.SentimentAnalysisTopology
or
mvn clean compile package && java -jar target/storm-sentiment-analysis-0.1-jar-with-dependencies.jar
Distributed mode requires a complete and proper Storm Cluster setup. Please check Apache Storm wiki for setting up a Storm Cluster.
In distributed mode, after starting Nimbus and Supervisors on individual machines, this application can be executed on the master [or Nimbus] machine by invoking the following on the command line:
storm jar target/storm-sentiment-analysis-0.1.jar org.p7h.storm.sentimentanalysis.topology.SentimentAnalysisTopology SentimentAnalysis
Note: Repo's recent update to Storm v0.9.3 was not tested in Cluster mode. But it should work as before, if the cluster setup is all fine.
If you find any issues, please report them either raising an issue here on GitHub or alert me on my Twitter handle @P7h. Or even better, please send a pull request.
Appreciate your help. Thanks!
Copyright © 2013-215 Prashanth Babu.
Licensed under the Apache License, Version 2.0.