This repository contains a collection of Jupyter notebooks for machine learning (aprendizaje automático) algorithms and techniques, with examples and practical implementations. These notebooks are exercises from the course Artificial Intelligence 2024 at the UIB (University de les Illes Balears). The course is taught by Miquel Miró Nicolau, Gabriel Moyà Alcover, Dr. Javier Varona Gómez. From the research group XAI (Explainable Artificial Intelligence).
1_ML_i_Perceptró.ipynb: Introduction to Machine Learning concepts and Perceptron implementation with practical activities.
2_Regr_Pràctica.ipynb and 2_Regressió_i_correlació.ipynb: Practice notebook for regression techniques including data exploration, correlation matrices, and model implementation.
3_Regressió_Logística_i_K-Fold.ipynb and 3_RegrLog_Pràctica.ipynb: Logistic regression implementation with K-Fold cross-validation, model evaluation and practical exercises (including model training, evaluation using accuracy metrics, and confusion matrix visualization).
4_SVM.ipynb and 4_SVM_Pràctica.ipynb: Support Vector Machine implementation with both linear and non-linear kernels, visualization of decision boundaries, and hyperparameter tuning using cross-validation. Includes practical exercises comparing SVM with other classification models (Perceptron, Logistic Regression).
5_Neteja_de_dades_i_DT.ipynb: This notebook covers data cleaning techniques, such as: handling missing values, categorical data encoding, feature scaling, and noise reduction.
ML_assignment.ipynb: Final course assignment applying multiple machine learning algorithms to the forest cover type dataset. The notebook includes data preprocessing (resampling for class balance and PCA for dimensionality reduction), hyperparameter optimization for various models (Perceptron, Logistic Regression, SVM, Decision Tree, Random Forest), and comprehensive model evaluation using confusion matrices and classification metrics.
- Perceptron: Simple neural network implementation for linear classification problems
- Logistic Regression: Probabilistic classification for binary and multi-class problems
- Support Vector Machines: Classification with both linear and non-linear kernels for optimal decision boundaries
- Decision Trees: Tree-based classification with conditional branching
- Random Forest: Ensemble method combining multiple decision trees for improved performance
K-Fold cross-validation is implemented in multiple notebooks to evaluate model performance. The technique divides the dataset into k subsets and uses each subset for testing while training on the remaining data.
- Accuracy scoring
- Confusion matrices
- Classification reports
- Precision, recall, and F1-score
- ROC curves and AUC analysis
- Grid search for hyperparameter tuning
- Custom product dictionary for parameter combinations
- Cross-validation based optimization
- Regularization parameter selection
- Handling missing values
- Categorical data encoding
- Feature scaling and normalization
- Dimensionality reduction using PCA
- Class balancing and resampling techniques
- Noise reduction methods
- Decision boundary visualization
- Feature correlation heatmaps
- Model performance comparison plots
- Learning curves
- Hyperparameter effect visualization
The notebooks contain comments and explanations in Catalan.
