6/20/2017

I had already made a Multi-Filter model for the test data, so I decided to try some of the other Filters.

The TweetToSparseFeatureVector, provided by the AffectiveTweets package on Weka, allows for a variety of different features, such as Character N-Grams, Word N-Grams, POS Tags, etc. The first TweetToSparseFeatureVector I tried was

weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -Q 3 -A -D 3 -E 5 -L -O -F -G 2 -I 2

which ran with these settings:

TweetToSparseFeatureVector

However, Weka was not working the way it was supposed to (or rather I could not get it to make models for the things I needed), so I created just the filters I would use, so I could perhaps test it on the UMD machines or see if my Mentor Dr. Carpuat would know what to do with them.

  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -Q 3 -A -D 3 -E 5 -L -O -F -G 2 -I 2 Word N-Grams, Character N-Grams, Negations, POS Tags, Brown Clusters
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -L -O Only Negations
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -A -D x -E y -L -O Only Character N-Grams
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -Q 3 -L -O Only Word N-Grams
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -A -D x -E y -Q 3 -L -O Word N-Grams and Character N-Grams
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -A -D x -E y -Q 3 -L -O Word N-Grams, Character N-Grams, POS Tags
  • weka.filters.unsupervised.attribute.TweetToSparseFeatureVector -M 2 -I 3 -R -A -D x -E y -Q 3 -L -O Word N-Grams, Character N-Grams, Negations

I plan on building my Sentiment Analysis system in Python, rather than Java, and I want to use the following Features: Word N-Grams, Character N-Grams, Negations, POS Tags, and Brown Clusters. Later I want to build on the system using some of the higher-end Filters that Weka provides (or the Python counterpart), and hopefully some other ones too.

Note: x, y, and z are placeholders for the Integer value that can be put there.

Note2: Here are the different configurations for the Filter

  • -I TextIndex (This is the first I in the config)
  • -R Negation
  • -A Character N-grams -D Min -E Max
  • -I MaxDim for clustNGram (This is the last I in the config)
  • -F Frequency Weights
  • -G MaxDim for POS N-gram
  • -L toLowerCase
  • -Q Word N-gram MaxDim
  • -O CleanToken
  • -M minAttDocs