This project provides a comprehensive exploratory data analysis (EDA) of a crop yield dataset. It focuses on identifying key patterns and trends in agricultural productivity, rainfall, and pesticide usage across various countries and years. The notebook showcases core competencies in data science, data analytics, and interactive data visualization using Python.
To explore and analyze crop yield data to uncover:
- Yearly and country-wise trends in crop yield
- Relationships between yield, rainfall, and pesticide usage
- Comparative insights across different crops and regions
-
Data Acquisition & Integration
Retrieved dataset usingkagglehub, showcasing the ability to access and manage external data sources. -
Data Cleaning & Preprocessing
- Removed irrelevant columns
- Identified missing values using
.info()and.isnull().sum() - Ensured clean data for further analysis
-
Exploratory Data Analysis (EDA)
- Counted unique countries and crop items
- Analyzed crop distribution using
value_counts() - Aggregated data by
Area,Item, andYearto generate insights
-
Statistical & Correlation Analysis
Developed acrop_countryfunction to measure correlation between yield and pesticide usage β useful for insight generation and hypothesis validation.
-
Interactive Charts
Created highly interactive and user-friendly visualizations using Plotly, ideal for reports and presentations. -
Visual Features Include:
- Bar charts of crop frequency
- Subplots comparing yield, rainfall, and pesticide usage across top 10 countries
- Yearly trend visualizations for both countries and crops
- Subplots built with
make_subplotsfor cohesive visual storytelling
-
Customization for Clarity
- Titles, axes, colors, labels, and layout tailored for easy interpretation
- Clear separation of insights via subplot grids and legends
The analysis provides decision-making support in:
- Identifying countries with high or low productivity
- Evaluating environmental and chemical factors affecting yield
- Informing agricultural strategy and sustainability practices
| Tool/Library | Purpose |
|---|---|
| Python | Programming language |
| Pandas | Data manipulation & aggregation |
| Plotly | Interactive data visualization |
| KaggleHub | Dataset acquisition |
| NumPy | Numerical computation |