LearningNLP/docs/unsupervised_learning.md at main · FrancescoPaoloL/LearningNLP

[Unsupervised Learning]

Document Clustering

Through this technique documents are grouped into clusters based on their content similarity; the objective is to discover inherent structures without predefined categories or labels. There are some method to do that. One popular is:
- K-means clustering: it is a unsupervised machine learning algorithm used for partitioning a dataset into 'k' distinct subgroups or clusters. The algorithm iteratively assigns data points to clusters based on their similarity to the cluster centroids and updates the centroids to minimize the total within-cluster variance. In order to show you how it works I've made a script that uses this algorithm from scratch.
- Affinity Propagation: Affinity Propagation is a clustering algorithm designed to identify representative points within a dataset. Unlike traditional clustering methods where the number of clusters needs to be specified beforehand, Affinity Propagation determines both the number of clusters and the exemplars simultaneously. Simply put, the algorithm iteratively exchanges messages between data points to find a set of exemplars that maximize a combination of similarity and preference values. The data points are then assigned to the nearest exemplar, forming clusters. To illustrate how it works, I've created a script utilizing this algorithm, and the results are visualized for clarity.

return to readme

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unsupervised Learning]

Document Clustering

FilesExpand file tree

unsupervised_learning.md

Latest commit

History

unsupervised_learning.md

File metadata and controls

[Unsupervised Learning]

Document Clustering