Sentiment Analysis Tool

A Python application that automatically analyzes comments from CSV files, categorizes them using AI, and generates comprehensive sentiment reports.

Limitation Note: This project is a proof of concept (POC) and, in some steps, processes the entire comments dataset at once. Supplying very large datasets may lead to excessive token usage and reduced output quality.

Features

Automatic Category Detection: AI-powered analysis to identify relevant categories from your comments
Batch Processing: Efficiently processes large volumes of comments in batches
Detailed Reporting: Generates comprehensive markdown reports with statistics and insights
CSV Support: Works with any CSV file containing a "Comment" column
Real-time Progress: Shows processing progress as comments are analyzed

Project Structure

├── main.py                 # Main application entry point
├── requirements.txt        # Python dependencies
├── source/                # Core modules
│   ├── define_categories.py    # AI-powered category definition
│   ├── classify_comments.py   # Comment classification logic
│   ├── generate_report.py     # Report generation
│   └── output_formats.py      # Output formatting utilities
├── data/                  # Input CSV files directory
│   └── ECLIPSE_ RISING.csv    # Sample data file
├── prompts/              # AI prompt templates (if any)
└── Sentiment Report.md   # Generated output report

Prerequisites

Python 3.8 or higher
OpenAI API key (for AI-powered categorization)

Installation

Clone or download this repository

git clone <repository-url>
cd <project-directory>

Create a virtual environment (recommended)

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Set up your OpenAI API key

Create a .env file in the project root:
```
echo "OPENAI_API_KEY=your_api_key_here" > .env
```
Replace your_api_key_here with your actual OpenAI API key.

Usage

Quick Start

To run the application with the sample data:

python main.py

This will process the included data/ECLIPSE_ RISING.csv file and generate a Sentiment Report.md.

Using Your Own Data

Prepare your CSV file
- Ensure your CSV has a column named "Comment"
- Place the file in the data/ directory

Update the file path in main.py

# Edit line 42 in main.py
report = main("data/YOUR_FILE.csv")

Run the application
```
python main.py
```

CSV Format Requirements

Your CSV file must contain at least one column named "Comment". Example:

ID,Comment
1,"This is a positive comment"
2,"This is a negative comment"
3,"This is a neutral comment"

How It Works

Data Loading: Reads comments from the specified CSV file
Category Definition: AI analyzes a sample of comments to automatically identify relevant categories
Comment Classification: Each comment is classified into one of the identified categories or marked as an outlier
Report Generation: Creates a detailed markdown report with:
- Category definitions and descriptions
- Comment distribution across categories
- Statistical analysis
- Key insights and trends

Output

The application generates a Sentiment Report.md file containing:

Executive Summary: Overview of the analysis
Category Breakdown: Detailed statistics for each category
Key Insights: AI-generated insights about the comment patterns
Recommendations: Actionable recommendations based on the analysis

Customization

Modifying Categories

The categories are automatically generated, but you can influence them by:

Modifying the prompts in the source/define_categories.py file
Adjusting the sample size used for category detection

Changing Output Format

Modify source/generate_report.py to change the report structure
Edit source/output_formats.py to customize the markdown formatting

Batch Size

Comments are processed in batches of 10 by default. To change this:

# In main.py, line 21, change the batch size:
for comments_batch in [comments[i:i+BATCH_SIZE] for i in range(0, len(comments), BATCH_SIZE)]:

API Usage

The application uses OpenAI's API for:

Automatic category detection from comment samples
Individual comment classification
Report generation and insight creation

Make sure you have sufficient API credits for your dataset size.

Sample Data

The included ECLIPSE_ RISING.csv contains 200 sample comments about a fictional TV show adaptation, including:

Fan reactions and excitement
Casting suggestions and preferences
Concerns about adaptation quality
Spam and promotional content
Social media interactions

This provides a good example of mixed sentiment social media data for testing the tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis Tool

Features

Project Structure

Prerequisites

Installation

Usage

Quick Start

Using Your Own Data

CSV Format Requirements

How It Works

Output

Customization

Modifying Categories

Changing Output Format

Batch Size

API Usage

Sample Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
prompts		prompts
source		source
.gitignore		.gitignore
README.md		README.md
Sentiment Report.md		Sentiment Report.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Tool

Features

Project Structure

Prerequisites

Installation

Usage

Quick Start

Using Your Own Data

CSV Format Requirements

How It Works

Output

Customization

Modifying Categories

Changing Output Format

Batch Size

API Usage

Sample Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages