It took so long to get Weka to work the way I wanted it to, but now it seems simpler than I thought. Before Friday, I needed to run ablation experiments to determine the importance of certain features in the baseline system. Saif Mohammad tested with word ngrams (WN), character ngrams (CN), word embeddings (WE), and lexicons (L); he used 11 lexicons, I will only test on 10 as I don’t have access to one of them.
Back in Weka, the way to correctly run models is to keep the preprocessing filter set to “None”, and then coming back to the classification tab to use the correct filter set and classifier. The configuration below describes one of the tests I ran.
weka.filters.MultiFilter -F "weka.filters.unsupervised.attribute.TweetToEmbeddingsFeatureVector -I 2 -B C:/Users/amanj/wekafiles/packages/AffectiveTweets/resources/w2v.twitter.edinburgh.100d.csv.gz -S 0 -K 15 -L -O" -F "weka.filters.unsupervised.attribute.TweetToLexiconFeatureVector -I 2 -A -D -F -H -J -L -N -P -Q -R -T -U -O" -F "weka.filters.unsupervised.attribute.TweetToSentiStrengthFeatureVector -I 2 -U -O" -F "weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 0 -I 1 -Q 4 -D 3 -E 5 -L -F -G 0 -I 0" -F "weka.filters.unsupervised.attribute.Reorder -R 5-last,4"
This test used word embeddings and all the lexicons (SentiStrength is a lexicon that is it’s own filter), and use the LibLINEAR classifier with L2 regularized and L2 loss. I ran the other tests the same way, by removing all lexicons except for the one I was testing for, or only using the word embeddings filter.
After running the tests, the ablation experiments were was mostly done.
Note: The scores calculated are the Pearson correlation scores.
Tomorrow I will finish running experiments with word ngrams and character ngrams, and then running those filters with word embeddings or lexicons. Lastly, there will be the final model which uses all the filters to look at how the model does when using all the features.
I also finished reading the EmoInt paper. It made me familiar with their process, and furthermore gave me ideas for applications for the regression/deep learning model I will create.