You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataPilot AI is a comprehensive, production-ready data science framework that transforms how you work with data. It combines the power of automated machine learning, explainable AI, and intelligent insights into one seamless toolkit.
"From raw data to actionable insights in minutes, not hours."
Whether you're a data scientist seeking to accelerate workflows, a business analyst needing quick insights, or a developer integrating ML into applications — DataPilot AI has you covered.
10+ ML Algorithms • Auto Preprocessing SHAP & LIME • Time Series Forecasting Interactive Dashboard • One-Click Reports
🎪 Key Highlights
┌─────────────────────────────────────────────────────────────────────────────────┐
│ │
│ 🔥 ZERO-CONFIG AUTOML 📊 BEAUTIFUL VISUALIZATIONS 🧠 AI INSIGHTS │
│ Train 10+ models with Publication-ready charts Smart pattern │
│ one line of code with Plotly & Seaborn detection │
│ │
│ ⚡ BLAZING FAST 🔍 EXPLAINABLE AI 🌐 WEB DASHBOARD │
│ Optimized algorithms SHAP & LIME built-in Streamlit UI │
│ with XGBoost & LightGBM for model transparency no coding needed │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
# Clone and install
git clone https://github.com/VivekGhantiwala/DataPilot-AI.git
cd DataPilot-AI
pip install -r requirements.txt
🐳 Option 3: Docker
# Build image
docker build -t datapilot-ai .# Run container
docker run -p 8501:8501 datapilot-ai
⚙️ Option 4: With Extras
# Install with all optional dependencies
pip install -e ".[all]"# Or specific extras
pip install -e ".[dev]"# Development tools
pip install -e ".[ml]"# Extra ML libraries
pip install -e ".[dashboard]"# Dashboard dependencies
fromsrcimportAutoMLimportpandasaspd# Load your datadata=pd.read_csv("your_data.csv")
X=data.drop(columns=["target"])
y=data["target"]
# Initialize AutoML - it's that simple! 🚀automl=AutoML(
task="classification", # or "regression", "auto"max_models=10, # Number of models to trycv_folds=5# Cross-validation folds
)
# Train all modelsautoml.fit(X, y)
# View resultsprint(automl.get_leaderboard())
print(automl.summary())
# Make predictionspredictions=automl.predict(X_new)
# Save best modelautoml.save("best_model.pkl")
fromsrcimportExploratoryAnalysisimportpandasaspd# Load datadata=pd.read_csv("your_data.csv")
# Run comprehensive EDAeda=ExploratoryAnalysis(data)
eda.print_report()
# Get specific insightscorrelations=eda.correlation_analysis()
missing=eda.missing_value_analysis()
outliers=eda.detect_outliers_summary()
stats=eda.get_statistics()
📈 Time Series Forecasting
fromsrcimportTimeSeriesAnalyzerimportpandasaspd# Load time series datadata=pd.read_csv("sales_data.csv")
# Initialize analyzerts=TimeSeriesAnalyzer(
data=data,
date_column="date",
value_column="sales"
)
# Run analysists.analyze()
# Generate 30-day forecastforecast=ts.forecast(periods=30, method="auto")
# Visualize resultsts.plot_forecast(forecast)
# Get detailed reportprint(ts.generate_report())
🔮 Model Explainability
fromsrcimportModelExplainerfromsklearn.ensembleimportRandomForestClassifier# Train your modelmodel=RandomForestClassifier()
model.fit(X_train, y_train)
# Create explainerexplainer=ModelExplainer(
model=model,
X_train=X_train,
feature_names=X_train.columns.tolist(),
task="classification"
)
# Explain a single predictionexplanation=explainer.explain_prediction(X_test.iloc[0])
# SHAP summary plotexplainer.plot_shap_summary(X_test)
# Feature importanceimportance=explainer.get_feature_importance()
# Generate reportreport=explainer.generate_report(X_test)
🧹 Data Preprocessing
fromsrcimportDataPreprocessor# Initialize preprocessorprep=DataPreprocessor()
# Load dataprep.load_data("raw_data.csv")
# Full pipeline (one-liner!)clean_data=prep.preprocess_pipeline(
handle_missing=True,
remove_dups=True,
handle_outliers_flag=True,
scale=True,
encode=True
)
# Or step-by-step with full controlprep.handle_missing_values(strategy="auto")
prep.remove_duplicates()
prep.handle_outliers(method="clip") # or "iqr", "zscore"prep.scale_features(method="standard") # or "minmax", "robust"prep.encode_categorical(method="onehot") # or "label"
🧠 AI-Powered Insights
fromsrcimportAIInsightsimportpandasaspddata=pd.read_csv("your_data.csv")
# Initialize AI insights engineai=AIInsights(data)
# Generate automated reportreport=ai.generate_automated_report(target_column="target")
print(report)
# Detect data quality issuesissues=ai.detect_data_quality_issues()
# Get smart recommendationsrecommendations=ai.generate_recommendations(task_type="classification")
# Quick insights summaryquick=ai.get_quick_insights()
# Build the image
docker build -t datapilot-ai .# Run with default settings
docker run -p 8501:8501 datapilot-ai
# Run with data volume
docker run -p 8501:8501 -v $(pwd)/data:/app/data datapilot-ai
# Run with environment variables
docker run -p 8501:8501 -e DEBUG=false datapilot-ai
🤔 What makes DataPilot AI different from other AutoML tools?
DataPilot AI combines AutoML, Explainable AI, Time Series, and AI Insights in one unified toolkit. Unlike tools that focus on just model training, we provide end-to-end coverage from data exploration to model explanation.
🐍 What Python versions are supported?
We support Python 3.9, 3.10, 3.11, and 3.12. Python 3.11 is recommended for best performance.
💾 Can I use my own models with the explainability module?
Yes! The ModelExplainer class works with any scikit-learn compatible model. Just pass your trained model, and you'll get SHAP/LIME explanations instantly.
🌐 Can I deploy the dashboard to the cloud?
Absolutely! The Streamlit dashboard can be deployed to:
Streamlit Cloud (free tier available)
Heroku
AWS/GCP/Azure with Docker
Any platform supporting Docker containers
📊 What data formats are supported?
CSV (.csv)
Excel (.xlsx, .xls)
Parquet (.parquet)
JSON (.json)
Pandas DataFrames (programmatic)
⚡ How fast is the AutoML pipeline?
Speed depends on data size and model count, but typical benchmarks:
1,000 rows, 10 models: ~30 seconds
10,000 rows, 10 models: ~2-5 minutes
100,000 rows, 10 models: ~10-20 minutes
LightGBM and XGBoost are particularly optimized for speed.
🔧 Can I customize the preprocessing pipeline?
Yes! You can either use the one-liner preprocess_pipeline() or chain individual methods (handle_missing_values(), scale_features(), etc.) for full control.
💖 Support & Sponsorship
If DataPilot AI helps your work, consider supporting!
📬 Get in Touch
📜 License
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 DataPilot AI Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software...
🙏 Acknowledgments
Built with Amazing Open Source Projects
Thanks for visiting! Star ⭐ this repo if you found it helpful!
🔬 A comprehensive Python framework for automated data analysis, machine learning, and AI-powered insights. Features AutoML, explainable AI (SHAP/LIME), time series forecasting, interactive dashboards, and automated report generation.