Spectrum Analysis with Machine Learning

A comprehensive Python framework for Near-Infrared (NIR) spectral data analysis, featuring preprocessing pipelines and comparative machine learning models (PLS, SVM, CNN). This project also includes a complete LaTeX template for generating research papers.

Features

Modular Architecture: Low coupling and high cohesion design with separated data loading, preprocessing, and modeling modules.
Preprocessing Pipeline:
- Savitzky-Golay (SG) Smoothing
- Standard Normal Variate (SNV)
- Multiplicative Scatter Correction (MSC)
Machine Learning Models:
- PLS (Partial Least Squares): Traditional chemometrics baseline.
- SVM (Support Vector Machine): Non-linear classification with RBF kernel.
- CNN (1D Convolutional Neural Network): Deep learning approach for automatic feature extraction.
Data Handling:
- Automatic download of open-source datasets (Peach Spectra).
- Synthetic data generation fallback for robust testing.
Publication Ready: Includes LaTeX templates for both English and Chinese research papers.

Project Structure

spectrum-analysis-with-ml/
├── data/                   # Data storage (downloaded or generated CSV files)
├── paper/                  # LaTeX source for research paper
│   ├── sections/           # Modularized LaTeX sections
│   ├── main.tex            # English paper entry point
│   └── main_zh.tex         # Chinese paper entry point
├── results/                # Generated plots and logs
├── src/                    # Source code
│   ├── config.py           # Global configuration
│   ├── data_loader.py      # Data fetching and generation
│   ├── models.py           # Model definitions (PLS, SVM, CNN)
│   ├── preprocessing.py    # Signal processing algorithms
│   ├── visualization.py    # Plotting functions
│   └── run_preprocessing.py # Data preprocessing script
├── main.py                 # Main execution script (training)
├── requirements.txt        # Python dependencies
└── README.md               # This file

Quick Start

1. Installation

Ensure you have Python 3.8+ installed.

# Clone the repository (if applicable)
# git clone ...

# Install dependencies
pip install -r requirements.txt

# Or using uv (Recommended)
uv sync

2. Run Analysis

The pipeline has two steps:

Step 1: Preprocess Data

python -m src.run_preprocessing
# Or with uv
uv run python -m src.run_preprocessing

This will:

Download the Peach Spectra dataset (or generate synthetic data if offline)
Apply SG smoothing + SNV preprocessing
Save processed data to data/peach_spectra_processed.csv
Generate raw and processed spectra plots

Step 2: Train Models

python main.py
# Or with uv
uv run main.py

This will:

Load preprocessed data
Train and evaluate PLS, SVM, and CNN models
Generate comparison plots and confusion matrices
Print evaluation metrics to the console

Paper Generation

To generate the research paper PDF, you need a LaTeX distribution (e.g., TeX Live, MiKTeX).

English Version:

cd paper
pdflatex main.tex
# Run bibtex if you have references
pdflatex main.tex

Chinese Version:

cd paper
xelatex main_zh.tex  # Use xelatex for better Chinese character support

Dataset

The project is configured to use the Peach Spectra dataset from nirpyresearch.

Features: NIR absorbance values (wavelengths).
Target: Brix values (sugar content), discretized into 3 classes for classification tasks.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
paper		paper
results		results
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
main.py		main.py
pyproject.toml		pyproject.toml
光谱数据人工智能数据挖掘.md		光谱数据人工智能数据挖掘.md
光谱数据人工智能数据挖掘.pptx		光谱数据人工智能数据挖掘.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spectrum Analysis with Machine Learning

Features

Project Structure

Quick Start

1. Installation

2. Run Analysis

Paper Generation

Dataset

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spectrum Analysis with Machine Learning

Features

Project Structure

Quick Start

1. Installation

2. Run Analysis

Paper Generation

Dataset

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages