This project involved cleaning and normalizing a dataset of reported device issues using SQL and text processing techniques. Various vectorization methods such as TF-IDF and Word2Vec were explored, alongside clustering algorithms including DBSCAN, K-Means, Neural Network Embedding, and Hierarchical Clustering, to effectively group similar issues. Clustering performance was evaluated using the Silhouette Score, and additional manual evaluation using N-grams helped categorize the issues into distinct groups, successfully meeting the client's requirements.
j50ju/Jira-Data-Mining
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|