GitHub - VivekGhantiwala/DataPilot-AI: 🔬 A comprehensive Python framework for automated data analysis, machine learning, and AI-powered insights. Features AutoML, explainable AI (SHAP/LIME), time series forecasting, interactive dashboards, and automated report generation.

Your Intelligent Data Analysis Copilot

📑 Table of Contents

🎯	Section	📝	Section
🌟	Overview	🧹	Preprocessing
✨	Features	🧠	AI Insights
🏗️	Architecture	📊	Visualizations
🚀	Quick Start	🐳	Docker
📖	Documentation	🗺️	Roadmap
🤖	AutoML	🤝	Contributing
📈	Time Series	❓	FAQ
🔮	Explainability	💖	Support

🌟 Overview

🎯 What is DataPilot AI?

DataPilot AI is a comprehensive, production-ready data science framework that transforms how you work with data. It combines the power of automated machine learning, explainable AI, and intelligent insights into one seamless toolkit.

"From raw data to actionable insights in minutes, not hours."

Whether you're a data scientist seeking to accelerate workflows, a business analyst needing quick insights, or a developer integrating ML into applications — DataPilot AI has you covered.

10+ ML Algorithms • Auto Preprocessing
SHAP & LIME • Time Series Forecasting
Interactive Dashboard • One-Click Reports

🎪 Key Highlights

┌─────────────────────────────────────────────────────────────────────────────────┐
│                                                                                 │
│   🔥 ZERO-CONFIG AUTOML        📊 BEAUTIFUL VISUALIZATIONS    🧠 AI INSIGHTS   │
│   Train 10+ models with        Publication-ready charts       Smart pattern    │
│   one line of code             with Plotly & Seaborn          detection        │
│                                                                                 │
│   ⚡ BLAZING FAST              🔍 EXPLAINABLE AI              🌐 WEB DASHBOARD │
│   Optimized algorithms         SHAP & LIME built-in          Streamlit UI      │
│   with XGBoost & LightGBM      for model transparency        no coding needed  │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

✨ Features

🔍 Exploratory Data Analysis

Automated Statistical Analysis

📊 Distribution & outlier detection
🔗 Correlation heatmaps
📈 Missing value analysis
📋 Data quality profiling
🎯 Target variable insights

🤖 AutoML Pipeline

Zero-Config Model Training

⚡ 10+ ML algorithms
🎛️ Hyperparameter tuning
📊 Model leaderboard
💾 One-click export
🔄 Cross-validation

📈 Time Series

Forecasting & Analysis

📉 Trend decomposition
🔮 ARIMA/Prophet/ETS
⚠️ Anomaly detection
📅 Seasonality analysis
📊 Confidence intervals

🔮 Model Explainability

Transparent ML Decisions

🎯 SHAP value analysis
🍋 LIME explanations
📊 Feature importance
📈 Partial dependence
🔍 Individual predictions

🧹 Data Preprocessing

Smart Data Cleaning

🔧 Missing value imputation
📏 Feature scaling
🏷️ Categorical encoding
🎯 Outlier treatment
⚖️ Class balancing

🧠 AI Insights

Intelligent Recommendations

🔍 Pattern detection
⚠️ Quality issue alerts
💡 Feature suggestions
📝 Auto report generation
🎯 Actionable insights

🛠️ Tech Stack

🔧 Core Technologies

🤖 Machine Learning & AI

📊 Visualization

🌐 Web & DevOps

🔍 Explainability

🏗️ Architecture

System Overview

                                    ┌─────────────────────────────────────────┐
                                    │           📥 DATA INPUT                 │
                                    │   CSV • Excel • Parquet • DataFrame     │
                                    └─────────────────┬───────────────────────┘
                                                      │
                                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                                 │
│                                    🧠 DATAPILOT AI CORE                                        │
│                                                                                                 │
│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐       │
│  │  🧹 PREPROCESS   │  │  📊 ANALYSIS     │  │  🤖 ML ENGINE    │  │  🔮 EXPLAINER    │       │
│  │                  │  │                  │  │                  │  │                  │       │
│  │ • Missing Values │  │ • EDA            │  │ • AutoML         │  │ • SHAP Values    │       │
│  │ • Outliers       │  │ • Statistics     │  │ • 10+ Algorithms │  │ • LIME           │       │
│  │ • Encoding       │  │ • Time Series    │  │ • Hyperparameter │  │ • Feature Import │       │
│  │ • Scaling        │  │ • AI Insights    │  │ • CV & Tuning    │  │ • PDP Plots      │       │
│  └──────────────────┘  └──────────────────┘  └──────────────────┘  └──────────────────┘       │
│                                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────────────────────┘
                                                      │
                    ┌─────────────────────────────────┼─────────────────────────────────┐
                    │                                 │                                 │
                    ▼                                 ▼                                 ▼
        ┌───────────────────┐             ┌───────────────────┐             ┌───────────────────┐
        │  🎨 DASHBOARD     │             │  💻 CLI           │             │  🐍 PYTHON API    │
        │  Streamlit UI     │             │  Command Line     │             │  Programmatic     │
        │  No-Code          │             │  Automation       │             │  Full Control     │
        └───────────────────┘             └───────────────────┘             └───────────────────┘

📁 Project Structure

📦 DataPilot-AI/
│
├── 🧠 src/                           # Core library modules
│   ├── __init__.py                   # Package exports
│   ├── ai_insights.py                # 🧠 AI-powered insights engine
│   ├── automl.py                     # 🤖 Automated machine learning
│   ├── data_preprocessing.py         # 🧹 Data cleaning & transformation
│   ├── eda.py                        # 📊 Exploratory data analysis
│   ├── explainability.py             # 🔮 SHAP & LIME integrations
│   ├── ml_models.py                  # 📈 ML model training (1200+ lines)
│   ├── time_series.py                # ⏰ Time series forecasting
│   ├── visualization.py              # 🎨 Data visualization utilities
│   ├── report_generator.py           # 📝 Automated report generation
│   └── data_generator.py             # 🎲 Synthetic data generation
│
├── 🎨 dashboard/
│   └── app.py                        # 🌐 Streamlit web interface (550+ lines)
│
├── 🧪 tests/                         # Test suite
│   ├── test_automl.py
│   ├── test_eda.py
│   └── test_preprocessing.py
│
├── 📊 data/
│   └── sample_data.csv               # Sample dataset
│
├── ⚙️ Configuration Files
│   ├── pyproject.toml                # Modern Python config
│   ├── requirements.txt              # Dependencies
│   ├── setup.py                      # Package setup
│   ├── Dockerfile                    # Container config
│   └── .github/workflows/ci.yml      # CI/CD pipeline
│
├── 📚 Documentation
│   ├── README.md                     # You are here! 📍
│   ├── CONTRIBUTING.md               # Contribution guide
│   ├── CHANGELOG.md                  # Version history
│   ├── CODE_OF_CONDUCT.md            # Community guidelines
│   └── SECURITY.md                   # Security policy
│
└── 💻 cli.py                         # Command-line interface

🚀 Quick Start

⚡ Get up and running in under 2 minutes!

Prerequisites

Requirement	Version	Notes
🐍 Python	3.9+	3.11 recommended
📦 pip	Latest	Package manager
🔧 Git	Any	For cloning

Installation Options

🎯 Option 1: Quick Install (Recommended)

# Clone the repository
git clone https://github.com/VivekGhantiwala/DataPilot-AI.git
cd DataPilot-AI

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install package
pip install -e .

📦 Option 2: Dependencies Only

# Clone and install
git clone https://github.com/VivekGhantiwala/DataPilot-AI.git
cd DataPilot-AI

pip install -r requirements.txt

🐳 Option 3: Docker

# Build image
docker build -t datapilot-ai .

# Run container
docker run -p 8501:8501 datapilot-ai

⚙️ Option 4: With Extras

# Install with all optional dependencies
pip install -e ".[all]"

# Or specific extras
pip install -e ".[dev]"      # Development tools
pip install -e ".[ml]"       # Extra ML libraries
pip install -e ".[dashboard]" # Dashboard dependencies

✅ Verify Installation

# Check CLI
python cli.py --help

# Run tests
pytest tests/ -v

# Launch dashboard
python cli.py dashboard

🎉 You're ready to go!

📖 Documentation

Choose Your Path

🎨 No Code	⌨️ CLI	🐍 Python API
Interactive Dashboard	Command Line	Full Programmatic Control
Upload & Click	Script Automation	Custom Workflows
Visual Results	Pipeline Integration	Production Ready

🎨 Interactive Dashboard

Launch the beautiful Streamlit dashboard for a zero-code experience:

# Using CLI
python cli.py dashboard

# Or directly
streamlit run dashboard/app.py

Then open http://localhost:8501 in your browser.

Dashboard Features: 📊 Data Overview • 📈 EDA • 🔧 Preprocessing • 🤖 ML Training • 🧠 AI Insights • 📥 Export

⌨️ Command Line Interface

# 📊 Exploratory Data Analysis
python cli.py analyze -i data.csv -t target_column -o report.txt

# 🤖 AutoML Training
python cli.py automl -i data.csv -t target --task classification --max-models 10

# 📈 Time Series Forecasting
python cli.py timeseries -i sales.csv --date-column date --value-column sales -f 30

# 🧹 Data Preprocessing
python cli.py preprocess -i raw.csv -o clean.csv --scale --encode

# 📝 Generate Report
python cli.py report -i data.csv -o report.html --title "Analysis Report"

🐍 Python API Examples

🤖 AutoML Pipeline

from src import AutoML
import pandas as pd

# Load your data
data = pd.read_csv("your_data.csv")
X = data.drop(columns=["target"])
y = data["target"]

# Initialize AutoML - it's that simple! 🚀
automl = AutoML(
    task="classification",  # or "regression", "auto"
    max_models=10,          # Number of models to try
    cv_folds=5              # Cross-validation folds
)

# Train all models
automl.fit(X, y)

# View results
print(automl.get_leaderboard())
print(automl.summary())

# Make predictions
predictions = automl.predict(X_new)

# Save best model
automl.save("best_model.pkl")

Output:

╔═══════════════════════════════════════════════════════════════════╗
║                        🏆 Model Leaderboard                        ║
╚═══════════════════════════════════════════════════════════════════╝

 Rank │ Model               │ Accuracy │ F1 Score │ ROC-AUC │ Time(s)
──────┼─────────────────────┼──────────┼──────────┼─────────┼─────────
  1   │ XGBoost             │  0.9421  │  0.9385  │  0.9712 │   2.3
  2   │ LightGBM            │  0.9398  │  0.9362  │  0.9689 │   1.1
  3   │ Random Forest       │  0.9356  │  0.9318  │  0.9645 │   4.7

📊 Exploratory Data Analysis

from src import ExploratoryAnalysis
import pandas as pd

# Load data
data = pd.read_csv("your_data.csv")

# Run comprehensive EDA
eda = ExploratoryAnalysis(data)
eda.print_report()

# Get specific insights
correlations = eda.correlation_analysis()
missing = eda.missing_value_analysis()
outliers = eda.detect_outliers_summary()
stats = eda.get_statistics()

📈 Time Series Forecasting

from src import TimeSeriesAnalyzer
import pandas as pd

# Load time series data
data = pd.read_csv("sales_data.csv")

# Initialize analyzer
ts = TimeSeriesAnalyzer(
    data=data,
    date_column="date",
    value_column="sales"
)

# Run analysis
ts.analyze()

# Generate 30-day forecast
forecast = ts.forecast(periods=30, method="auto")

# Visualize results
ts.plot_forecast(forecast)

# Get detailed report
print(ts.generate_report())

🔮 Model Explainability

from src import ModelExplainer
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Create explainer
explainer = ModelExplainer(
    model=model,
    X_train=X_train,
    feature_names=X_train.columns.tolist(),
    task="classification"
)

# Explain a single prediction
explanation = explainer.explain_prediction(X_test.iloc[0])

# SHAP summary plot
explainer.plot_shap_summary(X_test)

# Feature importance
importance = explainer.get_feature_importance()

# Generate report
report = explainer.generate_report(X_test)

🧹 Data Preprocessing

from src import DataPreprocessor

# Initialize preprocessor
prep = DataPreprocessor()

# Load data
prep.load_data("raw_data.csv")

# Full pipeline (one-liner!)
clean_data = prep.preprocess_pipeline(
    handle_missing=True,
    remove_dups=True,
    handle_outliers_flag=True,
    scale=True,
    encode=True
)

# Or step-by-step with full control
prep.handle_missing_values(strategy="auto")
prep.remove_duplicates()
prep.handle_outliers(method="clip")  # or "iqr", "zscore"
prep.scale_features(method="standard")  # or "minmax", "robust"
prep.encode_categorical(method="onehot")  # or "label"

🧠 AI-Powered Insights

from src import AIInsights
import pandas as pd

data = pd.read_csv("your_data.csv")

# Initialize AI insights engine
ai = AIInsights(data)

# Generate automated report
report = ai.generate_automated_report(target_column="target")
print(report)

# Detect data quality issues
issues = ai.detect_data_quality_issues()

# Get smart recommendations
recommendations = ai.generate_recommendations(task_type="classification")

# Quick insights summary
quick = ai.get_quick_insights()

🤖 AutoML Pipeline

🚀 End-to-End Automated Machine Learning

🔄 Pipeline Flow

┌──────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      │
│                            🤖 AUTOML PIPELINE WORKFLOW                               │
│                                                                                      │
│   ┌─────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌──────┐ │
│   │  DATA   │───▶│  AUTO       │───▶│  MODEL      │───▶│  HYPER      │───▶│DEPLOY│ │
│   │  INPUT  │    │  PREPROCESS │    │  SELECTION  │    │  TUNING     │    │      │ │
│   └─────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └──────┘ │
│       │                │                  │                  │                │     │
│       ▼                ▼                  ▼                  ▼                ▼     │
│   CSV/Excel      • Missing Values    • Task Detection   • Grid Search    • Export  │
│   DataFrame      • Encoding          • Algorithm Pool   • Random Search  • Predict │
│   Parquet        • Scaling           • Cross-Valid      • Best Params    • Serve   │
│                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────┘

📊 Supported Algorithms

🎯 Classification Models

#	Algorithm	Library	Best For
1	🌲 Random Forest	scikit-learn	Robust baseline
2	🚀 XGBoost	xgboost	High performance
3	⚡ LightGBM	lightgbm	Large datasets
4	🐱 CatBoost	catboost	Categorical data
5	📈 Logistic Regression	scikit-learn	Interpretable
6	🎯 SVM	scikit-learn	High dimensions
7	🏠 KNN	scikit-learn	Non-parametric
8	🌳 Decision Tree	scikit-learn	Explainable
9	🔄 AdaBoost	scikit-learn	Adaptive
10	🌿 Extra Trees	scikit-learn	Variance reduction

📈 Regression Models

#	Algorithm	Library	Best For
1	📏 Linear Regression	scikit-learn	Simple baseline
2	🎚️ Ridge	scikit-learn	L2 regularization
3	🎚️ Lasso	scikit-learn	L1 regularization
4	🌲 Random Forest	scikit-learn	Non-linear
5	🚀 XGBoost	xgboost	High performance
6	⚡ LightGBM	lightgbm	Speed
7	🔗 ElasticNet	scikit-learn	L1+L2 combined
8	🎯 SVR	scikit-learn	Kernel methods
9	📊 Gradient Boosting	scikit-learn	Sequential
10	🌿 Extra Trees	scikit-learn	Ensemble

⏰ Time Series Algorithms

Method	Library	Best For	Features
📈 ARIMA	statsmodels	Non-seasonal data	Trend, differencing
🔄 SARIMA	statsmodels	Seasonal patterns	Seasonality components
📊 Exponential Smoothing	statsmodels	Trend + seasonality	Level, trend, season
🔮 Prophet	prophet	Business forecasts	Holidays, events

🏆 Sample Leaderboard Output

╔═══════════════════════════════════════════════════════════════════════════════════╗
║                           🏆 AUTOML MODEL LEADERBOARD                             ║
╠═══════════════════════════════════════════════════════════════════════════════════╣
║                                                                                   ║
║  Rank │ Model               │ Accuracy │ Precision │ Recall │ F1-Score │ AUC     ║
║ ──────┼─────────────────────┼──────────┼───────────┼────────┼──────────┼─────────║
║   🥇  │ XGBoost             │  0.9421  │   0.9398  │ 0.9445 │  0.9421  │ 0.9712  ║
║   🥈  │ LightGBM            │  0.9398  │   0.9375  │ 0.9422 │  0.9398  │ 0.9689  ║
║   🥉  │ Random Forest       │  0.9356  │   0.9334  │ 0.9379 │  0.9356  │ 0.9645  ║
║   4   │ CatBoost            │  0.9312  │   0.9290  │ 0.9335 │  0.9312  │ 0.9612  ║
║   5   │ Gradient Boosting   │  0.9289  │   0.9267  │ 0.9312 │  0.9289  │ 0.9601  ║
║   6   │ Extra Trees         │  0.9234  │   0.9212  │ 0.9257 │  0.9234  │ 0.9567  ║
║   7   │ AdaBoost            │  0.9156  │   0.9134  │ 0.9179 │  0.9156  │ 0.9523  ║
║   8   │ SVM                 │  0.9089  │   0.9067  │ 0.9112 │  0.9089  │ 0.9478  ║
║   9   │ KNN                 │  0.8945  │   0.8923  │ 0.8968 │  0.8945  │ 0.9389  ║
║  10   │ Logistic Regression │  0.8823  │   0.8801  │ 0.8846 │  0.8823  │ 0.9312  ║
║                                                                                   ║
╚═══════════════════════════════════════════════════════════════════════════════════╝
                     ✅ Best Model: XGBoost (Accuracy: 94.21%)

📊 Visualization Gallery

Sample Outputs

┌────────────────────────────────────────────────────────────────────────────┐
│                                                                            │
│   ╔════════════════════════════════════════════════════════════════════╗   │
│   ║               📊 EXPLORATORY DATA ANALYSIS REPORT                  ║   │
│   ╚════════════════════════════════════════════════════════════════════╝   │
│                                                                            │
│   📌 Dataset Overview                                                      │
│   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│   • Total Rows: 10,000                                                     │
│   • Total Columns: 25                                                      │
│   • Memory Usage: 2.4 MB                                                   │
│   • Numerical Columns: 18                                                  │
│   • Categorical Columns: 7                                                 │
│                                                                            │
│   📈 Missing Values Analysis                                               │
│   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│   • income: 5.0% missing                                                   │
│   • credit_score: 3.0% missing                                             │
│                                                                            │
│   🔗 Top Correlations                                                      │
│   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   │
│   • income ↔ savings: 0.89                                                 │
│   • age ↔ years_employed: 0.76                                             │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Available Visualizations

Type	Description	Method
📊 Distribution	Histograms with KDE	`plot_distribution()`
📈 Box Plots	Outlier visualization	`plot_boxplot()`
🔥 Heatmaps	Correlation matrices	`plot_correlation_heatmap()`
🎯 Scatter	Relationship analysis	`plot_scatter()`
📉 Time Series	Temporal patterns	`plot_time_series()`
🏆 Feature Importance	Model insights	`plot_feature_importance()`
🎨 Dashboard	Multi-plot overview	`create_dashboard()`

🐳 Docker Deployment

🚢 Deploy Anywhere with Docker

# Build the image
docker build -t datapilot-ai .

# Run with default settings
docker run -p 8501:8501 datapilot-ai

# Run with data volume
docker run -p 8501:8501 -v $(pwd)/data:/app/data datapilot-ai

# Run with environment variables
docker run -p 8501:8501 -e DEBUG=false datapilot-ai

Docker Compose

version: '3.8'
services:
  datapilot:
    build: .
    ports:
      - "8501:8501"
    volumes:
      - ./data:/app/data
    environment:
      - DEBUG=false
      - AUTOML_MAX_MODELS=10

🗺️ Roadmap

📅 Development Timeline

✅ Completed (v1.0.0)

Feature	Status	Version
🔍 Core EDA Module	✅ Done	v1.0
🤖 AutoML Pipeline	✅ Done	v1.0
🧹 Data Preprocessing	✅ Done	v1.0
📈 Time Series Analysis	✅ Done	v1.0

Feature	Status	Version
🔮 Model Explainability	✅ Done	v1.0
🎨 Streamlit Dashboard	✅ Done	v1.0
💻 CLI Interface	✅ Done	v1.0
🐳 Docker Support	✅ Done	v1.0

🔄 In Progress (v1.1.0)

Feature	Progress	Expected
🧠 Deep Learning Integration	🟡🟡🟡⚪⚪ 60%	Q1 2026
💬 NLP Query Interface	🟡🟡⚪⚪⚪ 40%	Q1 2026
📊 Advanced Visualizations	🟡🟡🟡🟡⚪ 80%	Q1 2026

📋 Planned (v2.0.0+)

Feature	Priority	Timeline
☁️ Cloud Deployment Templates	🔴 High	Q2 2026
⚡ Real-time Streaming Analysis	🔴 High	Q2 2026
🗄️ Feature Store Integration	🟠 Medium	Q3 2026
🔄 MLOps Pipeline	🟠 Medium	Q3 2026
📈 Model Monitoring	🟡 Normal	Q4 2026
🧪 A/B Testing Framework	🟡 Normal	Q4 2026

📊 Overall Progress

Completed    ████████████████████████████████░░░░░░░░  80%
In Progress  ████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░  30%
Planned      ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   0%

🤝 Contributing

We 💖 Contributors!

Quick Start for Contributors

# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/DataPilot-AI.git
cd DataPilot-AI

# 2. Create feature branch
git checkout -b feature/amazing-feature

# 3. Install dev dependencies
pip install -e ".[dev]"

# 4. Make changes and test
pytest tests/ -v
black src/ tests/
flake8 src/ tests/

# 5. Commit with conventional commits
git commit -m "feat(automl): add amazing feature"

# 6. Push and create PR
git push origin feature/amazing-feature

Contribution Types

Type	Description	Label
🐛 Bug Fix	Fix existing issues	`bug`
✨ Feature	New functionality	`enhancement`
📚 Docs	Documentation improvements	`documentation`
🧪 Tests	Add or improve tests	`testing`
🎨 Style	Code style/formatting	`style`
♻️ Refactor	Code improvements	`refactor`

📖 Read the full guide: CONTRIBUTING.md

❓ Frequently Asked Questions

🤔 What makes DataPilot AI different from other AutoML tools?

DataPilot AI combines AutoML, Explainable AI, Time Series, and AI Insights in one unified toolkit. Unlike tools that focus on just model training, we provide end-to-end coverage from data exploration to model explanation.

🐍 What Python versions are supported?

We support Python 3.9, 3.10, 3.11, and 3.12. Python 3.11 is recommended for best performance.

💾 Can I use my own models with the explainability module?

Yes! The ModelExplainer class works with any scikit-learn compatible model. Just pass your trained model, and you'll get SHAP/LIME explanations instantly.

🌐 Can I deploy the dashboard to the cloud?

Absolutely! The Streamlit dashboard can be deployed to:

Streamlit Cloud (free tier available)
Heroku
AWS/GCP/Azure with Docker
Any platform supporting Docker containers

📊 What data formats are supported?

CSV (.csv)
Excel (.xlsx, .xls)
Parquet (.parquet)
JSON (.json)
Pandas DataFrames (programmatic)

⚡ How fast is the AutoML pipeline?

Speed depends on data size and model count, but typical benchmarks:

1,000 rows, 10 models: ~30 seconds
10,000 rows, 10 models: ~2-5 minutes
100,000 rows, 10 models: ~10-20 minutes

LightGBM and XGBoost are particularly optimized for speed.

🔧 Can I customize the preprocessing pipeline?

Yes! You can either use the one-liner preprocess_pipeline() or chain individual methods (handle_missing_values(), scale_features(), etc.) for full control.

💖 Support & Sponsorship

If DataPilot AI helps your work, consider supporting!

📬 Get in Touch

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2024 DataPilot AI Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software...

🙏 Acknowledgments

Built with Amazing Open Source Projects

Thanks for visiting! Star ⭐ this repo if you found it helpful!

Made with ❤️ by Vivek Ghantiwala and the DataPilot AI Community

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
dashboard		dashboard
data		data
src		src
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
cli.py		cli.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Your Intelligent Data Analysis Copilot

📑 Table of Contents

🌟 Overview

🎯 What is DataPilot AI?

🎪 Key Highlights

✨ Features

🔍 Exploratory Data Analysis

🤖 AutoML Pipeline

📈 Time Series

🔮 Model Explainability

🧹 Data Preprocessing

🧠 AI Insights

🛠️ Tech Stack

🔧 Core Technologies

🤖 Machine Learning & AI

📊 Visualization

🌐 Web & DevOps

🔍 Explainability

🏗️ Architecture

System Overview

📁 Project Structure

🚀 Quick Start

⚡ Get up and running in under 2 minutes!

Prerequisites

Installation Options

🎯 Option 1: Quick Install (Recommended)

📦 Option 2: Dependencies Only

🐳 Option 3: Docker

⚙️ Option 4: With Extras

✅ Verify Installation

🎉 You're ready to go!

📖 Documentation

Choose Your Path

🎨 Interactive Dashboard

⌨️ Command Line Interface

🐍 Python API Examples

🤖 AutoML Pipeline

📊 Exploratory Data Analysis

📈 Time Series Forecasting

🔮 Model Explainability

🧹 Data Preprocessing

🧠 AI-Powered Insights

🤖 AutoML Pipeline

🚀 End-to-End Automated Machine Learning

🔄 Pipeline Flow

📊 Supported Algorithms

🎯 Classification Models

📈 Regression Models

⏰ Time Series Algorithms

🏆 Sample Leaderboard Output

📊 Visualization Gallery

Sample Outputs

Available Visualizations

🐳 Docker Deployment

🚢 Deploy Anywhere with Docker

Docker Compose

🗺️ Roadmap

📅 Development Timeline

✅ Completed (v1.0.0)

🔄 In Progress (v1.1.0)

📋 Planned (v2.0.0+)

📊 Overall Progress

🤝 Contributing

We 💖 Contributors!

Quick Start for Contributors

Contribution Types

❓ Frequently Asked Questions

💖 Support & Sponsorship

If DataPilot AI helps your work, consider supporting!

📬 Get in Touch

📜 License

🙏 Acknowledgments

Built with Amazing Open Source Projects

Thanks for visiting! Star ⭐ this repo if you found it helpful!

About

Resources

Packages