Skip to content

Real-time handwritten digit recognition using CNN. Draw digits on canvas, get instant predictions via FastAPI + TensorFlow backend.

License

Notifications You must be signed in to change notification settings

CodeNinjaSarthak/LiveDigit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LiveDigit

A web application for handwritten digit recognition. Users draw digits (0-9) on an HTML canvas, and a convolutional neural network classifies the input after each stroke.

LiveDigit-demo.mp4

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                          Browser                                │
│  ┌───────────────┐    ┌──────────────────────────────────────┐ │
│  │  280×280      │    │  Prediction Display                  │ │
│  │  Canvas       │───▶│  - Primary digit + confidence        │ │
│  │  (drawing)    │    │  - Top-5 predictions                 │ │
│  └───────────────┘    │  - Prediction history (localStorage) │ │
│         │             └──────────────────────────────────────┘ │
└─────────┼───────────────────────────────────────────────────────┘
          │ POST /predict (base64 PNG)
          ▼
┌─────────────────────────────────────────────────────────────────┐
│                     FastAPI Backend                             │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Preprocessing Pipeline:                                    │ │
│  │ 1. Decode base64 → grayscale                              │ │
│  │ 2. Threshold (pixels < 20 set to 0)                       │ │
│  │ 3. Find bounding box, crop tightly                        │ │
│  │ 4. Resize to fit 20×20 (preserve aspect ratio, LANCZOS)   │ │
│  │ 5. Center in 28×28 using center of mass (scipy.ndimage)   │ │
│  │ 6. Normalize to [0, 1]                                    │ │
│  └────────────────────────────────────────────────────────────┘ │
│         │                                                       │
│         ▼                                                       │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ TensorFlow SavedModel (CNN)                                │ │
│  │ Input: (1, 28, 28, 1) → Output: (1, 10) softmax            │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Project Structure

LiveDigit/
├── backend/
│   └── main.py              # FastAPI server, preprocessing, inference
├── frontend/
│   ├── index.html           # UI structure
│   ├── script.js            # Drawing, API calls, state management
│   └── style.css            # Styles
├── model/
│   └── mnist_model/         # TensorFlow SavedModel (required)
├── train/
│   └── handwriting-recognizer.ipynb  # Model training (Kaggle environment)
└── requirements.txt

API

POST /predict

Request:

{
  "image": "data:image/png;base64,iVBORw0KGgo...",
  "debug": false
}

Response:

{
  "digit": 7,
  "confidence": 0.9823,
  "top_predictions": [
    {"digit": 7, "confidence": 0.9823},
    {"digit": 1, "confidence": 0.0098},
    {"digit": 9, "confidence": 0.0041},
    {"digit": 4, "confidence": 0.0022},
    {"digit": 2, "confidence": 0.0008}
  ],
  "cropped_image": null,
  "final_image": null
}

When debug: true, response includes cropped_image and final_image as base64 PNGs showing intermediate preprocessing steps.

Errors:

Status Meaning
400 Invalid image data
500 Inference failure
503 Model not loaded

GET /

Serves frontend/index.html.

GET /static/*

Serves static assets from frontend/.

Model

Three-layer CNN with L2 regularization (λ=0.0001) and dropout, trained on the Kaggle MNIST dataset.

Layer Configuration Output Shape
Conv2D 32 filters, 3×3, ReLU, same padding 28×28×32
MaxPooling2D 2×2 14×14×32
Dropout 0.2 14×14×32
Conv2D 64 filters, 3×3, ReLU, same padding 14×14×64
MaxPooling2D 2×2 7×7×64
Dropout 0.3 7×7×64
Conv2D 128 filters, 3×3, ReLU, same padding 7×7×128
MaxPooling2D 2×2 3×3×128
Dropout 0.3 3×3×128
Flatten 1152
Dense 64 units, ReLU 64
Dropout 0.2 64
Dense 10 units, softmax 10

Training configuration:

  • Optimizer: Adam
  • Loss: Sparse categorical cross-entropy
  • Callbacks: EarlyStopping (patience=10), ReduceLROnPlateau (factor=0.2, patience=5)
  • Batch size: 128
  • Train/val split: 80/20

Frontend

Drawing:

  • 280×280 canvas, white strokes (20px) on dark background (#0a0a0f)
  • Stroke smoothing via weighted averaging (factor: 0.3) and quadratic Bézier curves

Prediction behavior:

  • API called 100ms after last draw event (debounced)
  • Request timeout: 5 seconds
  • Retry on timeout: up to 2 attempts
  • Loading spinner displayed during request

Display:

  • Primary prediction with confidence percentage and progress bar
  • Top-5 predictions with horizontal confidence bars
  • Prediction history (last 20) persisted to localStorage

Accessibility:

  • ARIA labels on interactive elements
  • Keyboard shortcuts: Escape (clear canvas), Alt+C (clear), Alt+D (toggle debug)
  • Screen reader announcements via live region

Connection status:

  • Visual indicator (green/red dot)
  • Responds to browser online/offline events

Installation

Requires Python 3.10+.

pip install -r requirements.txt

Dependencies: fastapi, uvicorn, tensorflow 2.16.2, pillow, scipy, python-multipart

Running

python backend/main.py

Or with auto-reload:

uvicorn backend.main:app --reload --port 8000

Access at http://localhost:8000

Model Setup

The server expects a TensorFlow SavedModel at model/mnist_model/. Without it, the server starts but /predict returns HTTP 503.

To train a new model, use train/handwriting-recognizer.ipynb. The notebook is configured for Kaggle (expects data at /kaggle/input/digit-recognizer/). For local training, update the file paths.

Limitations

  • Single digit only: No segmentation for multiple digits
  • Input assumptions: Expects white strokes on dark background; the preprocessing pipeline does not invert colors
  • MNIST distribution: Accuracy depends on how closely the drawn digit matches MNIST style (centered, isolated)
  • History storage: Thumbnails stored as full-size data URLs (not compressed)
  • Training notebook: Hardcoded Kaggle paths require modification for local use
  • No authentication: API is open; not suitable for public deployment without additional security

Development Notes

Design decisions:

  • Vanilla JavaScript: No framework dependencies; the frontend is small enough that a framework would add complexity without benefit.
  • Debounced prediction: Prediction triggers 100ms after drawing stops, balancing responsiveness with avoiding excessive API calls during active drawing.
  • MNIST-style preprocessing: The backend replicates the preprocessing used in the original MNIST dataset (crop, resize to 20×20, center by mass in 28×28) to maximize compatibility with the trained model.
  • Top-5 predictions: Provides transparency into model confidence distribution, useful for understanding misclassifications.
  • Debug mode: Exposes preprocessing steps to help diagnose why certain digits are misclassified.

Scope:

This is a demonstration project showing end-to-end ML deployment with a browser frontend. It is not optimized for production use (no rate limiting, no model versioning, no telemetry).

License

MIT License

Copyright (c) 2025

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

Real-time handwritten digit recognition using CNN. Draw digits on canvas, get instant predictions via FastAPI + TensorFlow backend.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors