A comprehensive Python framework for Near-Infrared (NIR) spectral data analysis, featuring preprocessing pipelines and comparative machine learning models (PLS, SVM, CNN). This project also includes a complete LaTeX template for generating research papers.
- Modular Architecture: Low coupling and high cohesion design with separated data loading, preprocessing, and modeling modules.
- Preprocessing Pipeline:
- Savitzky-Golay (SG) Smoothing
- Standard Normal Variate (SNV)
- Multiplicative Scatter Correction (MSC)
- Machine Learning Models:
- PLS (Partial Least Squares): Traditional chemometrics baseline.
- SVM (Support Vector Machine): Non-linear classification with RBF kernel.
- CNN (1D Convolutional Neural Network): Deep learning approach for automatic feature extraction.
- Data Handling:
- Automatic download of open-source datasets (Peach Spectra).
- Synthetic data generation fallback for robust testing.
- Publication Ready: Includes LaTeX templates for both English and Chinese research papers.
spectrum-analysis-with-ml/
├── data/ # Data storage (downloaded or generated CSV files)
├── paper/ # LaTeX source for research paper
│ ├── sections/ # Modularized LaTeX sections
│ ├── main.tex # English paper entry point
│ └── main_zh.tex # Chinese paper entry point
├── results/ # Generated plots and logs
├── src/ # Source code
│ ├── config.py # Global configuration
│ ├── data_loader.py # Data fetching and generation
│ ├── models.py # Model definitions (PLS, SVM, CNN)
│ ├── preprocessing.py # Signal processing algorithms
│ ├── visualization.py # Plotting functions
│ └── run_preprocessing.py # Data preprocessing script
├── main.py # Main execution script (training)
├── requirements.txt # Python dependencies
└── README.md # This file
Ensure you have Python 3.8+ installed.
# Clone the repository (if applicable)
# git clone ...
# Install dependencies
pip install -r requirements.txt
# Or using uv (Recommended)
uv syncThe pipeline has two steps:
Step 1: Preprocess Data
python -m src.run_preprocessing
# Or with uv
uv run python -m src.run_preprocessingThis will:
- Download the Peach Spectra dataset (or generate synthetic data if offline)
- Apply SG smoothing + SNV preprocessing
- Save processed data to
data/peach_spectra_processed.csv - Generate raw and processed spectra plots
Step 2: Train Models
python main.py
# Or with uv
uv run main.pyThis will:
- Load preprocessed data
- Train and evaluate PLS, SVM, and CNN models
- Generate comparison plots and confusion matrices
- Print evaluation metrics to the console
To generate the research paper PDF, you need a LaTeX distribution (e.g., TeX Live, MiKTeX).
English Version:
cd paper
pdflatex main.tex
# Run bibtex if you have references
pdflatex main.texChinese Version:
cd paper
xelatex main_zh.tex # Use xelatex for better Chinese character supportThe project is configured to use the Peach Spectra dataset from nirpyresearch.
- Features: NIR absorbance values (wavelengths).
- Target: Brix values (sugar content), discretized into 3 classes for classification tasks.