Skip to content

celynnmoonlight/spectrum-analysis-with-ml

Repository files navigation

Spectrum Analysis with Machine Learning

A comprehensive Python framework for Near-Infrared (NIR) spectral data analysis, featuring preprocessing pipelines and comparative machine learning models (PLS, SVM, CNN). This project also includes a complete LaTeX template for generating research papers.

Features

  • Modular Architecture: Low coupling and high cohesion design with separated data loading, preprocessing, and modeling modules.
  • Preprocessing Pipeline:
    • Savitzky-Golay (SG) Smoothing
    • Standard Normal Variate (SNV)
    • Multiplicative Scatter Correction (MSC)
  • Machine Learning Models:
    • PLS (Partial Least Squares): Traditional chemometrics baseline.
    • SVM (Support Vector Machine): Non-linear classification with RBF kernel.
    • CNN (1D Convolutional Neural Network): Deep learning approach for automatic feature extraction.
  • Data Handling:
    • Automatic download of open-source datasets (Peach Spectra).
    • Synthetic data generation fallback for robust testing.
  • Publication Ready: Includes LaTeX templates for both English and Chinese research papers.

Project Structure

spectrum-analysis-with-ml/
├── data/                   # Data storage (downloaded or generated CSV files)
├── paper/                  # LaTeX source for research paper
│   ├── sections/           # Modularized LaTeX sections
│   ├── main.tex            # English paper entry point
│   └── main_zh.tex         # Chinese paper entry point
├── results/                # Generated plots and logs
├── src/                    # Source code
│   ├── config.py           # Global configuration
│   ├── data_loader.py      # Data fetching and generation
│   ├── models.py           # Model definitions (PLS, SVM, CNN)
│   ├── preprocessing.py    # Signal processing algorithms
│   ├── visualization.py    # Plotting functions
│   └── run_preprocessing.py # Data preprocessing script
├── main.py                 # Main execution script (training)
├── requirements.txt        # Python dependencies
└── README.md               # This file

Quick Start

1. Installation

Ensure you have Python 3.8+ installed.

# Clone the repository (if applicable)
# git clone ...

# Install dependencies
pip install -r requirements.txt

# Or using uv (Recommended)
uv sync

2. Run Analysis

The pipeline has two steps:

Step 1: Preprocess Data

python -m src.run_preprocessing
# Or with uv
uv run python -m src.run_preprocessing

This will:

  • Download the Peach Spectra dataset (or generate synthetic data if offline)
  • Apply SG smoothing + SNV preprocessing
  • Save processed data to data/peach_spectra_processed.csv
  • Generate raw and processed spectra plots

Step 2: Train Models

python main.py
# Or with uv
uv run main.py

This will:

  • Load preprocessed data
  • Train and evaluate PLS, SVM, and CNN models
  • Generate comparison plots and confusion matrices
  • Print evaluation metrics to the console

Paper Generation

To generate the research paper PDF, you need a LaTeX distribution (e.g., TeX Live, MiKTeX).

English Version:

cd paper
pdflatex main.tex
# Run bibtex if you have references
pdflatex main.tex

Chinese Version:

cd paper
xelatex main_zh.tex  # Use xelatex for better Chinese character support

Dataset

The project is configured to use the Peach Spectra dataset from nirpyresearch.

  • Features: NIR absorbance values (wavelengths).
  • Target: Brix values (sugar content), discretized into 3 classes for classification tasks.

License

MIT License

About

A Python framework for NIR spectral analysis that preprocesses spectra (SG, SNV, MSC) and applies machine learning models (PLS, SVM, 1D CNN) for classification and regression.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors