Skip to content

MuellerLeonard/Word-embeddings_CNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment analysis with word embeddings and Convolutional Neural Networks

Datasets:

  1. "Twitter" dataset
    • 61529 different tweets and 300 feature dimensions, with a vocabulary of 73562
    • Twitter hashtags crawled: 'ADBE', 'GOOGL', 'AMZN', 'AAPL', 'ADSK', 'BKNG', 'EXPE', 'INTC', 'MSFT', 'NFLX', 'NVDA', 'PYPL', 'SBUX', 'TSLA', 'XEL', 'positive', 'bad' and 'sad'
    • 3 different sentiments: positive, negative or neutral
  2. Pre-trained english word vectors trained with fastText on CommonCrawl and Wikipedia
  3. Pre-trained word vectors trained with GloVe on Twitter data

How to run the code:

  1. Written in Google Colab Notebook
  2. Download the following files:
    • "Twitter" dataset, found in the folder files
    • FastText word vectors (see link, needs to be reduced to 200 dimensions)
    • model = fasttext.load_model('path to dataset')
      fasttext.util.reduce_model(model, 200)
      #test
      model.get_dimension()
    • GloVe word vectors (see link)
    • sentqs_dataset.npz link: https://cloud.fhws.de/index.php/s/8BwgMykHEf9BwAd
  3. Open the latest Notebook
    • Experiments: here the Experiments for the thesis were performed
    • ft_text_classification: Demo file for sentiment analysis
  4. Link the datasets needed to run the code
  5. For Skip-gram sentence embedding use 1/4th of the "Twitter" dataset, to not overload the RAM
    • can be found in the folder files

About

Using the word embeddings: fastText, Skip-gram and GloVe as basis for a one-dimensional and a two-dimensional Convolutional Neural Network. Performing Sentiment analysis on a "Twitter" dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors