NLP-Tools/sample.txt at master · devspidr/NLP-Tools · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#####


🔠 1️⃣ Text Preprocessing

    text_cleaning.py

        Remove punctuation, lowercasing, stopword removal, stemming/lemmatization.

    stopwords_removal.py

        Load stopwords from NLTK or custom list and remove them from texts.

📊 2️⃣ Feature Extraction

    bag_of_words.py

        Implement a simple Bag-of-Words vectorizer.

    tfidf_vectorizer.py

        TF-IDF vectorizer to create weighted features.

    word_embeddings.py

        Load pre-trained embeddings (Word2Vec, GloVe) and get vectors for words/sentences.

📚 3️⃣ Similarity Measures

    cosine_similarity.py ✅ (you already have this!)

    jaccard_similarity.py

        Implement Jaccard similarity for sets of tokens.

✂️ 4️⃣ Tokenization & Sentence Splitting

    tokenizer.py

        Custom word and sentence tokenizers.

📈 5️⃣ Text Classification / Topic Modeling

    naive_bayes_classifier.py

        Naive Bayes classifier for text (spam detection, sentiment).

    lda_topic_modeling.py

        LDA (Latent Dirichlet Allocation) for extracting topics.

🏷️ 6️⃣ NER & POS Tagging

    pos_tagger.py

        Part-of-speech tagging with NLTK or SpaCy.

    ner_spacy.py

        Named Entity Recognition using SpaCy.

🧠 7️⃣ Deep Learning-based NLP

    sentiment_analysis_lstm.py

        Sentiment analysis using LSTM model.

    transformer_embeddings.py

        Use BERT/DistilBERT embeddings for text.

🔎 8️⃣ Utilities & Visualizations

    wordcloud_generator.py

        Generate word clouds for quick insights.

    frequency_analysis.py

        Plot word/phrase frequency distribution