-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathHierachiralClustering.py
More file actions
49 lines (40 loc) · 2.32 KB
/
HierachiralClustering.py
File metadata and controls
49 lines (40 loc) · 2.32 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Import necessary libraries
"""
Hierarchical clustering helps discover natural groupings in data by building a tree of clusters without predefined labels. The dendrogram provides insight into the clustering process and helps decide the number of clusters. Agglomerative clustering is intuitive and widely used in fields like bioinformatics, marketing, and exploratory data analysis for understanding relationships between data points.
This example demonstrates both the linkage/dendrogram approach (SciPy) and the direct clustering approach (scikit-learn) with visualization to interpret the results clearly.
"""
import numpy as np # For numerical operations and array handling
import matplotlib.pyplot as plt # For plotting data and dendrogram
from scipy.cluster.hierarchy import dendrogram, linkage # For hierarchical clustering and dendrogram plotting
from sklearn.cluster import AgglomerativeClustering # For agglomerative hierarchical clustering implementation
# Step 1: Create sample data points (two variables x and y)
x = [4, 5, 10, 4, 3, 11, 14, 6, 10, 12]
y = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]
# Combine x and y into a list of coordinate pairs (2D points)
data = list(zip(x, y))
# Step 2: Visualize the data points with a scatter plot
plt.scatter(x, y)
plt.title("Scatter plot of data points")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()
# Step 3: Perform hierarchical/agglomerative clustering using SciPy's linkage function
# 'ward' linkage minimizes variance within clusters
# 'euclidean' distance measures straight-line distance between points
linkage_data = linkage(data, method='ward', metric='euclidean')
# Step 4: Plot the dendrogram to visualize the hierarchical clustering
dendrogram(linkage_data)
plt.title("Dendrogram of hierarchical clustering")
plt.xlabel("Sample index")
plt.ylabel("Distance")
plt.show()
# Step 5: Use scikit-learn's AgglomerativeClustering to cluster the data into 2 clusters
hierarchical_cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')
# Fit the model and predict cluster labels for each data point
labels = hierarchical_cluster.fit_predict(data)
# Step 6: Visualize the clustered data points with colors indicating cluster membership
plt.scatter(x, y, c=labels)
plt.title("Agglomerative Clustering Result")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()