This repository represents an academic workshop of data mining course. It contains a practical assignment to get in depth with both supervised and unsupervised learning.
The objectives learnt are :
- Visualizing the dataset
- Using
naive bayesmodel and learning its prinicples - Implementing a method that splits dataset into training and test datasets ( A manual implementation of sklearn
train_test_splitfunction ) - Training the model using different training dataset size
- Calculating errors and scores in each case
- Cross validation
- Using
Random Forestmodel
You can find the notebook here : https://github.com/BenrhayemRacem/GL4_TP_DATA_MINING/tree/supervised_learning
The objectives learnt are :
- Visualizing the dataset
- Using
kmeansmodel and learning its prinicples - Calculating the
silhouette score - Drawing the
dendrogramwith hierarchical agglomerative clustering algorithm (HAC) - Using the Principal Component Analysis (PCA)
- Using an Agglomerative Clustering (AGNES) and drawing its dendrogram
- Comparing HAC and Agglomerative Clustering results with the kmeans using crosstab
- Implementing a manual DIANA ( DIvisie ANAlysis) approach based on kmeans
You can find the notebook here : https://github.com/BenrhayemRacem/GL4_TP_DATA_MINING/tree/unsupervised_learning