Skip to content

ayushcodes13/TradeSync-ML-vs-Rule-Based-Trading-System

Repository files navigation

๐Ÿ“Š XGBoost Stock Trading Notebook Documentation

Welcome to the documentation for the Jupyter notebook xgboost.ipynb, which implements, visualizes, and compares both rule-based and machine learning (XGBoost) strategies for predicting next-day movement (up/down) of selected Indian stocks using technical indicators. This notebook is designed to help you explore the interplay between technical signals and modern ML models in stock trading.


๐Ÿ—‚๏ธ Index

  1. Overview
  2. Data Collection & Initialization
  3. Data Visualization
  4. Technical Indicator Preprocessing
  5. Feature Engineering
  6. Model Training & Backtesting
  7. Agreement Analysis
  8. Evaluation Metrics
  9. Trading Pipeline Flowchart
  10. XGBoost Model API (Model Asset)
  11. Best Practices & Limitations
  12. References

1. Overview

This notebook demonstrates a quantitative trading pipeline that answers the question:

Can simple technical signals be improved upon by machine learning for next-day price direction prediction?

It does this across three large Indian stocks:

  • RELIANCE.NS (Reliance Industries)
  • HDFCBANK.NS (HDFC Bank)
  • INFY.NS (Infosys)

The notebook fetches historical prices, computes technical indicators, creates trading signals, and compares a rule-based trading approach with an XGBoost classifier, then assesses where they agree (consensus signals).


2. Data Collection & Initialization

Key Steps:

  • Imports: yfinance, pandas, numpy, matplotlib, seaborn, plotly, scikit-learn, ta (technical analysis), xgboost, and utility modules.
  • Stock Selection:
    symbols = ["RELIANCE.NS", "HDFCBANK.NS", "INFY.NS"]
  • Fetching Metadata:
    • Loops through each symbol, downloads metadata (name, market, sector), and prints it.
    • Uses yfinance Ticker.info.

Example Output:

Symbol: RELIANCE.NS
Name: RELIANCE INDUSTRIES LTD
Market: in_market
Sector: Energy
------------------------------

3. Data Visualization

  • Price & Volume Chart: For each symbol, fetches 5 years of historical OHLCV data and plots Close price and Volume on dual y-axes.
  • Tabular Preview: Displays the first and last 5 rows for a 6-month window.

Visualization Example:

  • The plot shows the evolution of price and volume, highlighting regime changes, trends, and volatility.

4. Technical Indicator Preprocessing

For each stock, the notebook computes:

  • RSI (14 days)
  • MACD and MACD Signal
  • SMA 20, SMA 50
  • EMA 20
  • Volatility: 10-day rolling std of Close
  • Buy_Signal: (RSI < 35) and (SMA_20 > SMA_50)
  • Target: Next-day close > todayโ€™s close (binary classification)

Returns a DataFrame with these features and signals.


5. Feature Engineering

  • Feature Set:
    • RSI, MACD, MACD_Signal, SMA_20, EMA_20, Volatility
  • Target:
    • Next-day price movement (binary: 1 = up, 0 = down)
  • Rule-based Logic:
    • Buy signal when RSI < 35 and SMA_20 > SMA_50

6. Model Training & Backtesting

ML (XGBoost) Approach

  • Train/Test Split:
    • Train: All data before 1 Feb 2025
    • Test: 1 Feb 2025 โ€“ 31 Jul 2025
  • Searches best XGBoost hyperparameters via TimeSeriesSplit and precision scoring.
  • Metrics: Accuracy, Precision, Recall, F1, Confusion Matrix.

Example Output:

Best Parameters: {'learning_rate': 0.01, 'max_depth': 4, 'n_estimators': 50}
Accuracy: 48.36%
Precision: 48.98%
Recall: 78.69%
F1-Score: 60.38%
  • Confusion Matrix: Plotted for each stock.

Rule-Based Strategy

  • Simulates trading whenever the logic signal is True.
  • Trade_Return: (next day's close โ€“ todayโ€™s close) / todayโ€™s close โ€“ 0.1% (transaction cost)
  • Performance metrics: total trades, win ratio, average profit/loss, cumulative return.
  • Cumulative return plot: Shows compounding of rule-based trades.

7. Agreement Analysis

  • Consensus trades: Days when both the ML model and the rule-based logic say 'buy'.
  • Agreement Metrics:
    • Total agreement trades
    • Profitable agreement trades
    • Win ratio
    • Average return
    • Cumulative return plot for agreement-only trades

8. Evaluation Metrics

Metric Description
Accuracy Correct predictions / total predictions
Precision Correct โ€œupโ€ predictions / all โ€œupโ€ predictions
Recall Correct โ€œupโ€ predictions / all actual โ€œupโ€ instances
F1-Score Harmonic mean of precision and recall
Win Ratio Profitable trades / total trades
Avg Profit/Loss Average % gain/loss per trade
Cumulative Return Compound return from following the strategy

9. Trading Pipeline Flowchart

Below is a diagram showing the pipeline from data preprocessing to trade execution.

flowchart TD
    A[Raw Stock Price Data] --> B[Technical Indicator Calculation]
    B --> C[Feature Engineering]
    C --> D1[ML Model (XGBoost) Prediction]
    C --> D2[Rule-Based Logic Signal]
    D1 --> E[Signal Agreement Analysis]
    D2 --> E
    E --> F{Consensus?}
    F -- Yes --> G[Simulated Trade (Agreement)]
    F -- No --> H[No Trade]
    G --> I[Backtest Performance Metrics]
    H --> I
Loading

10. XGBoost Model API (Model Asset)

For each stock, the trained XGBoost model is saved as a JSON asset (e.g., INFY_model.json). To use the model, load and call predict() with the 6-feature input.

{
    "title": "Stock Direction Prediction (XGBoost Model)",
    "description": "Predicts next-day price direction (up/down) based on technical indicators.",
    "method": "POST",
    "baseUrl": "http://localhost:8501",
    "endpoint": "/predict",
    "headers": [
        {
            "key": "Content-Type",
            "value": "application/json",
            "required": true
        }
    ],
    "bodyType": "json",
    "requestBody": "{\n  \"RSI\": 31.27,\n  \"MACD\": -5.33,\n  \"MACD_Signal\": -4.21,\n  \"SMA_20\": 1567.32,\n  \"EMA_20\": 1571.88,\n  \"Volatility\": 11.09\n}",
    "responses": {
        "200": {
            "description": "Prediction result",
            "body": "{\n  \"prediction\": 0\n}"
        }
    }
}

Here, 0 = down, 1 = up.


11. Best Practices & Limitations

Best Practices

  • Always use a strictly out-of-sample backtest period.
  • Tune ML model hyperparameters using time-series cross-validation.
  • Incorporate transaction costs in backtests.
  • Compare ML with simple rules to avoid overfitting hype.

Limitations

  • No risk management: Position sizing, stop-losses, and slippage not modeled.
  • No walk-forward retraining: Models are not retrained during backtest.
  • No feature selection or ensembling: Only basic technicals used.
  • No capital constraints or realistic brokerage simulation.
  • Small sample in backtest window may not generalize.

12. References


๐Ÿ“ Summary Table

Stock ML Model Accuracy Rule-Based Win Ratio Agreement Trades Agreement Win Ratio
INFY.NS 48.36% 0.00% 2 0.00%
RELIANCE.NS 50.82% 40.00% 3 33.33%
HDFCBANK.NS 45.90% 0.00% 0 0.00%

๐Ÿ“Œ Key Takeaways

  • Both ML (XGBoost) and simple logic are very noisy for next-day direction, even when tuned and filtered.
  • "Agreement" (consensus) trades are rare and not consistently profitable (in this test window).
  • Visualization shows the signals are sparse and gains are often offset by losses, especially with transaction costs.
  • Despite the power of ML, overfitting and limited predictability in short-term price movement remain major challenges.

This notebook provides a full, reproducible pipeline for technical trading ML research, and exposes the reality of predictive edge in liquid stocks. Feel free to use it as a base for deeper studies, adding more features, or more realistic trading simulation!

About

Can simple technical signals be improved upon by machine learning for next-day price direction prediction?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors