6/19/2017

Today was the first official day of my summer internship, although I will start going to UMD more regularly starting Wednesday.

My previous attempts with Weka (the baseline system I have to test against) weren’t going so well, so I decided to start fresh and reinstall it. Upon reinstalling, I also installed the EmoInt package, which gave me the resources to test the saifmohammad.com data on.

Through the EmoInt ReadMe, I was able to run my first real test with Weka. First, I used the tweets_to_arff.py program to convert the data to the correct format that was accepted by Weka. Next, I fed the training data and the testing data into Weka, and used the filters and classifiers that were needed.

Current Filter: weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.TweetToEmbeddingsFeatureVector -I 2 -B C:/Users/amanj/wekafiles/packages/AffectiveTweets/resources/w2v.
twitter.edinburgh.100d.csv.gz -S 0 -K 15 -L -O" -F "weka.filters.unsupervised.attribute.TweetToLexiconFeatureVector -I 2 -A -D -F -H -J -L -N -P -Q -R -T -U -O" -F "weka.filters.unsupervised.attribute.TweetToSentiStrengthFeatureVector -I 2 -U -O" -F "weka.filters.unsupervised.attribute.Reorder -R 5-last,4"

Current classifier: weka.classifiers.functions.LibLINEAR -S 12 -C 1.0 -E 0.001 -B 1.0 -L 0.1 -I 1000

The 4 filters I used on this test were the TweetToEmbeddingsFeatureVector, TweetToLexiconFeatureVector, TweetToSentiStrengthFeatureVector, and the Reorder filers. I was using the LibLINEAR classifier (which will remain constant through my preliminary tests).

This test returned the following results:

inst#,actual,predicted,error
1,0.319,0.384,0.065
2,0.144,0.329,0.185
3,0.898,0.721,-0.177
4,0.271,0.434,0.163
...
760,0.417,0.523,0.106

=== Evaluation on test set ===

Time taken to test model on supplied test set: 0.41 seconds

=== Summary ===

Correlation coefficient 0.625
Kendall's tau 0.4454
Spearman's rho 0.6155
Mean absolute error 0.1069
Root mean squared error 0.1358
Relative absolute error 76.1346 %
Root relative squared error 79.0147 %
Total Number of Instances 760

Tomorrow, I will finish my tests, using word and character n-grams, POS tags, etc. and using them with and without other filters, to figure out which filters I want to use in my model.

Note: The filters I use in my model will be added upon. If I only use four filters to start, eventually I will try to add more and more filters to get the best possible Sentiment Analysis system.