6/29/2017

The main thing I had to do today was to get the word ngrams and character ngrams feature set to work to finish the ablation experiments. While I wasn’t able to get it to work in the past, I realized the problem almost immediately today.

The index it was trying to filter was set to 3 (as default), which means the only word it would ever be trying to filter into different word ngrams and character ngrams would be the emotion. By changing the index to 2, it would then filter the tweet.

The final ablation table

AblationTable2.PNG

As shown by the table, the overall best performing combination was word embeddings with all the lexicons. Using the word ngrams and character ngrams alongside the other filters proved to be a little less accurate, possibly due to overfitting.

 

6/28/2017

Finally Weka!

It took so long to get Weka to work the way I wanted it to, but now it seems simpler than I thought. Before Friday, I needed to run ablation experiments to determine the importance of certain features in the baseline system. Saif Mohammad tested with word ngrams (WN), character ngrams (CN), word embeddings (WE), and lexicons (L); he used 11 lexicons, I will only test on 10 as I don’t have access to one of them.

Back in Weka, the way to correctly run models is to keep the preprocessing filter set to “None”, and then coming back to the classification tab to use the correct filter set and classifier. The configuration below describes one of the tests I ran.

weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.TweetToEmbeddingsFeatureVector -I 2 -B C:/Users/amanj/wekafiles/packages/AffectiveTweets/resources/w2v.twitter.edinburgh.100d.csv.gz -S 0 -K 15 -L -O" -F "weka.filters.unsupervised.attribute.TweetToLexiconFeatureVector -I 2 -A -D -F -H -J -L -N -P -Q -R -T -U -O" -F "weka.filters.unsupervised.attribute.TweetToSentiStrengthFeatureVector -I 2 -U -O" -F "weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 0 -I 1 -Q 4 -D 3 -E 5 -L -F -G 0 -I 0" -F "weka.filters.unsupervised.attribute.Reorder -R 5-last,4"

This test used word embeddings and all the lexicons (SentiStrength is a lexicon that is it’s own filter), and use the LibLINEAR classifier with L2 regularized and L2 loss. I ran the other tests the same way, by removing all lexicons except for the one I was testing for, or only using the word embeddings filter.

After running the tests, the ablation experiments were was mostly done.AblationTable1
Note: The scores calculated are the Pearson correlation scores.

Tomorrow I will finish running experiments with word ngrams and character ngrams, and then running those filters with word embeddings or lexicons. Lastly, there will be the final model which uses all the filters to look at how the model does when using all the features.

I also finished reading the EmoInt paper. It made me familiar with their process, and furthermore gave me ideas for applications for the regression/deep learning model I will create.

 

6/21/2017

I took some time away from the TweetToSparseFeatureVector today and focused on getting scores for all the different training data sets: Anger, Fear, Joy, and Sadness. I ran my MultiFilter on Fear, Joy, and Sadness, since I had already ran it on Anger.

=== Anger Summary ===

Correlation coefficient 0.625
Kendall's tau 0.4454
Spearman's rho 0.6155
Mean absolute error 0.1069
Root mean squared error 0.1358
Relative absolute error 76.1346 %
Root relative squared error 79.0147 %
Total Number of Instances 760

=== Fear Summary ===

Correlation coefficient 0.6216
Kendall's tau 0.4382
Spearman's rho 0.6087
Mean absolute error 0.1275
Root mean squared error 0.1575
Relative absolute error 77.1929 %
Root relative squared error 78.3684 %
Total Number of Instances 995

=== Joy Summary ===

Correlation coefficient 0.636
Kendall's tau 0.4603
Spearman's rho 0.6435
Mean absolute error 0.1348
Root mean squared error 0.1688
Relative absolute error 73.702 %
Root relative squared error 77.5147 %
Total Number of Instances 714

=== Sadness Summary ===

Correlation coefficient 0.7094
Kendall's tau 0.5229
Spearman's rho 0.7116
Mean absolute error 0.1142
Root mean squared error 0.1431
Relative absolute error 67.2283 %
Root relative squared error 70.4165 %
Total Number of Instances 673

The next step was to compare this system to the Weka Baseline system created by the task creators. If my system is on par (or better) than the Weka Baseline system, I can then use it to compare my Linear Regression and eventually Deep Learning models to.

System comparison:

System Avg. Pearson Avg. Spearman Anger Pearson Anger Spearman Fear Pearson Fear Spearman Joy Pearson Joy Spearman Sadness Pearson Sadness Spearman
Weka Baseline System 0.648 0.641 0.639 0.615 0.652 0.635 0.654 0.662 0.648 0.651
My System (4 filters) 0.648 0.6448 0.625 0.6155 0.6216 0.6087 0.636 0.6435 0.7094 0.7116

As shown in the table, my system is not that far off in the different emotions (in fact, it performed better than the Weka Baseline system for Sadness), and the average Pearson and Spearman scores are higher. While I could use this system to test my future models on, I want to try one more time and add the TweetToSparseFeatureVector, as I think it will benefit my system.

Additionally, today I looked into some tools that I could use when building my future Sentiment Analysis model. I found a python module called Natural Language Tool Kit (NLTK) which makes NLP in python very simple. There are some other cool tools such as H2O.ai, and Turi’s GraphLab Create (which I have used in the past).

By Friday, I hope to have the preliminary testing with Weka done, and then hopefully I can get started with my own model next week.

6/20/2017

I had already made a Multi-Filter model for the test data, so I decided to try some of the other Filters.

The TweetToSparseFeatureVector, provided by the AffectiveTweets package on Weka, allows for a variety of different features, such as Character N-Grams, Word N-Grams, POS Tags, etc. The first TweetToSparseFeatureVector I tried was

weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -Q 3 -A -D 3 -E 5 -L -O -F -G 2 -I 2

which ran with these settings:

TweetToSparseFeatureVector

However, Weka was not working the way it was supposed to (or rather I could not get it to make models for the things I needed), so I created just the filters I would use, so I could perhaps test it on the UMD machines or see if my Mentor Dr. Carpuat would know what to do with them.

  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -Q 3 -A -D 3 -E 5 -L -O -F -G 2 -I 2 Word N-Grams, Character N-Grams, Negations, POS Tags, Brown Clusters
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -L -O Only Negations
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -A -D x -E y -L -O Only Character N-Grams
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -Q 3 -L -O Only Word N-Grams
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -A -D x -E y -Q 3 -L -O Word N-Grams and Character N-Grams
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -A -D x -E y -Q 3 -L -O Word N-Grams, Character N-Grams, POS Tags
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -A -D x -E y -Q 3 -L -O Word N-Grams, Character N-Grams, Negations

I plan on building my Sentiment Analysis system in Python, rather than Java, and I want to use the following Features: Word N-Grams, Character N-Grams, Negations, POS Tags, and Brown Clusters. Later I want to build on the system using some of the higher-end Filters that Weka provides (or the Python counterpart), and hopefully some other ones too.

Note: x, y, and z are placeholders for the Integer value that can be put there.

Note2: Here are the different configurations for the Filter

  • -I TextIndex (This is the first I in the config)
  • -R Negation
  • -A Character N-grams -D Min -E Max
  • -I MaxDim for clustNGram (This is the last I in the config)
  • -F Frequency Weights
  • -G MaxDim for POS N-gram
  • -L toLowerCase
  • -Q Word N-gram MaxDim
  • -O CleanToken
  • -M minAttDocs

6/19/2017

Today was the first official day of my summer internship, although I will start going to UMD more regularly starting Wednesday.

My previous attempts with Weka (the baseline system I have to test against) weren’t going so well, so I decided to start fresh and reinstall it. Upon reinstalling, I also installed the EmoInt package, which gave me the resources to test the saifmohammad.com data on.

Through the EmoInt ReadMe, I was able to run my first real test with Weka. First, I used the tweets_to_arff.py program to convert the data to the correct format that was accepted by Weka. Next, I fed the training data and the testing data into Weka, and used the filters and classifiers that were needed.

Current Filter: weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.TweetToEmbeddingsFeatureVector -I 2 -B C:/Users/amanj/wekafiles/packages/AffectiveTweets/resources/w2v.
twitter.edinburgh.100d.csv.gz -S 0 -K 15 -L -O" -F "weka.filters.unsupervised.attribute.TweetToLexiconFeatureVector -I 2 -A -D -F -H -J -L -N -P -Q -R -T -U -O" -F "weka.filters.unsupervised.attribute.TweetToSentiStrengthFeatureVector -I 2 -U -O" -F "weka.filters.unsupervised.attribute.Reorder -R 5-last,4"

Current classifier: weka.classifiers.functions.LibLINEAR -S 12 -C 1.0 -E 0.001 -B 1.0 -L 0.1 -I 1000

The 4 filters I used on this test were the TweetToEmbeddingsFeatureVector, TweetToLexiconFeatureVector, TweetToSentiStrengthFeatureVector, and the Reorder filers. I was using the LibLINEAR classifier (which will remain constant through my preliminary tests).

This test returned the following results:

inst#,actual,predicted,error
1,0.319,0.384,0.065
2,0.144,0.329,0.185
3,0.898,0.721,-0.177
4,0.271,0.434,0.163
...
760,0.417,0.523,0.106

=== Evaluation on test set ===

Time taken to test model on supplied test set: 0.41 seconds

=== Summary ===

Correlation coefficient 0.625
Kendall's tau 0.4454
Spearman's rho 0.6155
Mean absolute error 0.1069
Root mean squared error 0.1358
Relative absolute error 76.1346 %
Root relative squared error 79.0147 %
Total Number of Instances 760

Tomorrow, I will finish my tests, using word and character n-grams, POS tags, etc. and using them with and without other filters, to figure out which filters I want to use in my model.

Note: The filters I use in my model will be added upon. If I only use four filters to start, eventually I will try to add more and more filters to get the best possible Sentiment Analysis system.