This repository is a comprehensive collection of resources, code, and datasets for performing Exploratory Data Analysis (EDA) on a variety of real-world datasets. It is designed for data science learners and practitioners who want to explore, visualize, and gain insights from data using Python and Jupyter Notebooks.
| File/Folder | Description |
|---|---|
Plotters.ipynb |
Notebook for general plotting and visualization techniques. |
Problem1.ipynb |
EDA notebook for Problem 1 (see notebook for details). |
Pronlem2.ipynb |
EDA notebook for Problem 2 (typo: should be 'Problem2'). |
Pronlem3.ipynb |
EDA notebook for Problem 3 (typo: should be 'Problem3'). |
Problem4.ipynb |
EDA notebook for Problem 4. |
Problem5.ipynb |
EDA notebook for Problem 5. |
Problem6.ipynb |
EDA notebook for Problem 6. |
Tesla.csv |
Tesla stock data for analysis. |
ipl_batting.csv |
IPL cricket batting data. |
netflix_titles.csv |
Netflix titles dataset. |
train.csv |
Generic training dataset (context in notebooks). |
weather.csv |
Weather data for EDA. |
world_population.csv |
World population statistics. |
all_stocks_5yr.csv |
5 years of stock data for analysis. |
Connections.csv |
LinkedIn connections data. |
DelayedFlights.csv |
US flight delay dataset. |
insurance.csv |
Insurance data for EDA. |
IRIS.csv |
Classic Iris flower dataset. |
Problem7.ipynb |
EDA notebook for Problem 7. |
Problem8.ipynb |
EDA notebook for Problem 8. |
Problem9.ipynb |
EDA notebook for Problem 9. |
Problem10.ipynb |
EDA notebook for Problem 10. |
Problem11.ipynb |
EDA notebook for Problem 11. |
Rich_Media.csv |
LinkedIn post/media data. |
LICENSE |
License file for this repository. |
README.md |
This documentation file. |
Each notebook is self-contained and focuses on a specific dataset or EDA technique. They include:
- Data loading and cleaning
- Exploratory visualizations (histograms, scatter plots, bar charts, etc.)
- Statistical summaries
- Insights and observations
Refer to the top of each notebook for a summary of its purpose and the dataset it uses.
The repository includes several CSV files for hands-on EDA practice. These datasets cover topics such as stock prices, sports analytics, entertainment, weather, and demographics. You can use them directly in the provided notebooks or for your own analysis.
- Clone the repository:
git clone https://github.com/Dhanas3kar/EDA.git cd EDA - Set up a Python environment:
- It is recommended to use a virtual environment:
python -m venv venv venv\Scripts\activate # On Windows # or source venv/bin/activate # On macOS/Linux
- It is recommended to use a virtual environment:
- Install required packages:
- Most notebooks use
pandas,matplotlib, andseaborn. Install them with:pip install pandas matplotlib seaborn
- Most notebooks use
- Open notebooks:
- Use JupyterLab, Jupyter Notebook, or VS Code to open and run the
.ipynbfiles.
- Use JupyterLab, Jupyter Notebook, or VS Code to open and run the
- Run the notebooks cell by cell to see the analysis and visualizations.
- Modify the code to try your own EDA ideas or apply techniques to new datasets.
- Use the provided datasets for practice or coursework.
Contributions are welcome! You can:
- Fix typos or improve documentation
- Add new datasets or notebooks
- Suggest new EDA techniques or visualizations
To contribute, fork the repository, make your changes, and submit a pull request.
This project is licensed under the terms of the MIT License. See the LICENSE file for details.
For questions, suggestions, or collaboration, please open an issue or contact the repository owner via GitHub.