SECOM Sensor Data Analysis & Failure Prediction

This project focuses on the SECOM (Semiconductor Manufacturing) Dataset, dealing with high-dimensional sensor data to predict process failures.

SECOM Sensor Data Analysis & Failure Prediction

This repository contains a comprehensive data science pipeline for analyzing semiconductor manufacturing process data. The goal is to handle a highly imbalanced dataset and high-dimensional sensor readings to accurately predict "Fail" (1) vs "Pass" (-1) outcomes.

📌 Project Overview

In semiconductor manufacturing, monitoring sensors is crucial for quality control. This project explores the SECOM dataset from the UCI Machine Learning Repository, which consists of 1,567 examples, each with 591 features (sensor readings).

📂 File Structure

uci_sensor_data1.ipynb: The primary research notebook containing data cleaning, exploratory analysis, and model experimentation.
uci_sensor_data1.py: The production-ready Python script converted from the notebook for easier deployment and batch processing.

🛠️ Technical Workflow

1. Data Cleaning & Preprocessing

Handling Missing Values: Identification of sensors with high null-value percentages. Sensors with excessive missing data are dropped, while others are imputed (Median/Mean).
Constant Feature Removal: Dropping sensors that show zero variance (constants), as they provide no predictive power.
Imbalance Handling: The dataset is heavily skewed towards "Pass" results. The project implements techniques like SMOTE (Synthetic Minority Over-sampling Technique) to balance the classes before training.

2. Feature Selection & Dimensionality Reduction

Given the 500+ features, the project uses:

Correlation Analysis: To remove highly redundant sensors.
PCA (Principal Component Analysis): Reducing dimensionality while retaining maximum variance to improve model efficiency and reduce noise.

3. Statistical Analysis

VIF (Variance Inflation Factor): Used to detect multicollinearity among sensor readings.
Visual Analysis: Utilizing histograms and boxplots to identify sensor drifts and outliers.

4. Machine Learning Models

The project evaluates several classifiers to find the best fit for high-dimensional sensor data:

Logistic Regression (Baseline)
Random Forest Classifier
XGBoost / LightGBM (Optimized for performance on imbalanced data)
Support Vector Machines (SVM)

🚀 Getting Started

Prerequisites

Ensure you have the following libraries installed:

pip install pandas numpy seaborn matplotlib scikit-learn imbalanced-learn xgboost

Usage

Clone the repository:

git clone https://github.com/gokilanr/Secom.git

Run the notebook or the python script:
```
python uci_sensor_data1.py
```

📊 Results

The final model focuses on maximizing Recall and F1-Score rather than just Accuracy, ensuring that potential manufacturing failures are not missed (minimizing False Negatives).

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
.gitattributes		.gitattributes
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
uci-secom.csv		uci-secom.csv
uci_sensor_data1.ipynb		uci_sensor_data1.ipynb
xgboost_model.pkl		xgboost_model.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SECOM Sensor Data Analysis & Failure Prediction

📌 Project Overview

📂 File Structure

🛠️ Technical Workflow

1. Data Cleaning & Preprocessing

2. Feature Selection & Dimensionality Reduction

3. Statistical Analysis

4. Machine Learning Models

🚀 Getting Started

Prerequisites

Usage

📊 Results

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SECOM Sensor Data Analysis & Failure Prediction

📌 Project Overview

📂 File Structure

🛠️ Technical Workflow

1. Data Cleaning & Preprocessing

2. Feature Selection & Dimensionality Reduction

3. Statistical Analysis

4. Machine Learning Models

🚀 Getting Started

Prerequisites

Usage

📊 Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages