A Data-Driven Analysis of Temporal Sales Dynamics and Demand Peaks
This project investigates temporal purchasing behavior within a retail bakery environment using transactional sales data. The objective is to identify patterns in customer demand across time and evaluate how these patterns can inform operational decision-making.
The analysis provides insight into how temporal dynamics influence retail performance and demonstrates how data-driven approaches can support optimization of:
- Inventory management
- Staffing allocation
- Production scheduling
| Category | Techniques |
|---|---|
| Data Preprocessing | Cleaning, type conversion, missing value handling |
| Feature Engineering | Time-based feature extraction (hourly segmentation) |
| Exploratory Analysis | Distribution analysis, correlation matrices, trend visualization |
| Statistical Testing | Relationship analysis between price, quantity, and time |
| Clustering | K-Means (manual implementation + scikit-learn) |
| Classification | K-NN (manual implementation + scikit-learn) |
| Dimensionality Reduction | Principal Component Analysis (PCA) |
transaction-pattern-analysis/
├── transaction_pattern_analysis.ipynb # Main analysis notebook
├── Bakery sales.csv # Raw dataset (234,005 transactions)
├── Bakery_Sales1.json.zip # JSON export
└── README.md # Project documentation
The dataset contains 234,005 transactional records from a French retail bakery spanning from January 2021 to September 2022.
| Field | Description |
|---|---|
date |
Transaction date |
time |
Transaction time (HH:MM) |
ticket_number |
Unique transaction identifier |
article |
Product name |
Quantity |
Number of items purchased |
unit_price |
Price per unit (€) |
Source: Kaggle - French Bakery Daily Sales
Note: Data cleaning is performed directly within the notebook to ensure transparency and reproducibility.
The analysis reveals clear peak demand periods throughout the day, enabling targeted staffing and inventory decisions.
Peak Hours: Morning rush (8-10 AM) and afternoon (12-2 PM)
K-Means clustering segments transactions into distinct behavioral groups based on quantity and pricing patterns.
Principal Component Analysis reduces dimensionality while preserving variance, revealing underlying structure in transaction data.
| Tool | Purpose |
|---|---|
| Core programming language | |
| Data manipulation & analysis | |
| Numerical computing | |
| Data visualization | |
| Statistical visualization | |
| Machine learning |
- Python 3.8+
- Jupyter Notebook or Google Colab
-
Clone the repository
git clone https://github.com/anaya33/transaction-pattern-analysis.git cd transaction-pattern-analysis -
Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn
-
Launch the notebook
jupyter notebook transaction_pattern_analysis.ipynb
- Open Google Colab
- Upload
transaction_pattern_analysis.ipynb - Upload
Bakery sales.csvor connect via GitHub - Run cells sequentially
Contributions are welcome! Here's how you can help:
- Bug Reports — Found an issue? Open a detailed bug report
- Feature Requests — Have ideas for new analyses? Share them!
- New Visualizations — Add compelling charts or dashboards
- Additional ML Models — Implement other clustering/classification algorithms
- Documentation — Improve explanations or add tutorials
- Code Optimization — Enhance performance or code quality
- Fork the repository
- Create a feature branch
git checkout -b feature/your-feature-name
- Commit your changes with clear messages
git commit -m "Add: description of your changes" - Push to your branch
git push origin feature/your-feature-name
- Open a Pull Request with a detailed description
| Difficulty | Task |
|---|---|
| Easy | Add more visualizations (box plots, violin plots) |
| Easy | Improve code comments and documentation |
| Medium | Implement DBSCAN or hierarchical clustering |
| Medium | Add time series forecasting (ARIMA, Prophet) |
| Medium | Create an interactive dashboard (Plotly/Dash) |
| Advanced | Build a recommendation system for products |
| Advanced | Deploy as a web application |
This project is licensed under the MIT License — see the LICENSE file for details.
- Dataset provided by Matthieu Gimbert on Kaggle
- Developed as part of Advanced Data Analytics (ITEC 4220) coursework
Star this repo if you found it helpful!