LiveDigit

A web application for handwritten digit recognition. Users draw digits (0-9) on an HTML canvas, and a convolutional neural network classifies the input after each stroke.

LiveDigit-demo.mp4

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                          Browser                                │
│  ┌───────────────┐    ┌──────────────────────────────────────┐ │
│  │  280×280      │    │  Prediction Display                  │ │
│  │  Canvas       │───▶│  - Primary digit + confidence        │ │
│  │  (drawing)    │    │  - Top-5 predictions                 │ │
│  └───────────────┘    │  - Prediction history (localStorage) │ │
│         │             └──────────────────────────────────────┘ │
└─────────┼───────────────────────────────────────────────────────┘
          │ POST /predict (base64 PNG)
          ▼
┌─────────────────────────────────────────────────────────────────┐
│                     FastAPI Backend                             │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Preprocessing Pipeline:                                    │ │
│  │ 1. Decode base64 → grayscale                              │ │
│  │ 2. Threshold (pixels < 20 set to 0)                       │ │
│  │ 3. Find bounding box, crop tightly                        │ │
│  │ 4. Resize to fit 20×20 (preserve aspect ratio, LANCZOS)   │ │
│  │ 5. Center in 28×28 using center of mass (scipy.ndimage)   │ │
│  │ 6. Normalize to [0, 1]                                    │ │
│  └────────────────────────────────────────────────────────────┘ │
│         │                                                       │
│         ▼                                                       │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ TensorFlow SavedModel (CNN)                                │ │
│  │ Input: (1, 28, 28, 1) → Output: (1, 10) softmax            │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Project Structure

LiveDigit/
├── backend/
│   └── main.py              # FastAPI server, preprocessing, inference
├── frontend/
│   ├── index.html           # UI structure
│   ├── script.js            # Drawing, API calls, state management
│   └── style.css            # Styles
├── model/
│   └── mnist_model/         # TensorFlow SavedModel (required)
├── train/
│   └── handwriting-recognizer.ipynb  # Model training (Kaggle environment)
└── requirements.txt

API

`POST /predict`

Request:

{
  "image": "data:image/png;base64,iVBORw0KGgo...",
  "debug": false
}

Response:

{
  "digit": 7,
  "confidence": 0.9823,
  "top_predictions": [
    {"digit": 7, "confidence": 0.9823},
    {"digit": 1, "confidence": 0.0098},
    {"digit": 9, "confidence": 0.0041},
    {"digit": 4, "confidence": 0.0022},
    {"digit": 2, "confidence": 0.0008}
  ],
  "cropped_image": null,
  "final_image": null
}

When debug: true, response includes cropped_image and final_image as base64 PNGs showing intermediate preprocessing steps.

Errors:

Status	Meaning
400	Invalid image data
500	Inference failure
503	Model not loaded

`GET /`

Serves frontend/index.html.

`GET /static/*`

Serves static assets from frontend/.

Model

Three-layer CNN with L2 regularization (λ=0.0001) and dropout, trained on the Kaggle MNIST dataset.

Layer	Configuration	Output Shape
Conv2D	32 filters, 3×3, ReLU, same padding	28×28×32
MaxPooling2D	2×2	14×14×32
Dropout	0.2	14×14×32
Conv2D	64 filters, 3×3, ReLU, same padding	14×14×64
MaxPooling2D	2×2	7×7×64
Dropout	0.3	7×7×64
Conv2D	128 filters, 3×3, ReLU, same padding	7×7×128
MaxPooling2D	2×2	3×3×128
Dropout	0.3	3×3×128
Flatten	—	1152
Dense	64 units, ReLU	64
Dropout	0.2	64
Dense	10 units, softmax	10

Training configuration:

Optimizer: Adam
Loss: Sparse categorical cross-entropy
Callbacks: EarlyStopping (patience=10), ReduceLROnPlateau (factor=0.2, patience=5)
Batch size: 128
Train/val split: 80/20

Frontend

Drawing:

280×280 canvas, white strokes (20px) on dark background (#0a0a0f)
Stroke smoothing via weighted averaging (factor: 0.3) and quadratic Bézier curves

Prediction behavior:

API called 100ms after last draw event (debounced)
Request timeout: 5 seconds
Retry on timeout: up to 2 attempts
Loading spinner displayed during request

Display:

Primary prediction with confidence percentage and progress bar
Top-5 predictions with horizontal confidence bars
Prediction history (last 20) persisted to localStorage

Accessibility:

ARIA labels on interactive elements
Keyboard shortcuts: Escape (clear canvas), Alt+C (clear), Alt+D (toggle debug)
Screen reader announcements via live region

Connection status:

Visual indicator (green/red dot)
Responds to browser online/offline events

Installation

Requires Python 3.10+.

pip install -r requirements.txt

Dependencies: fastapi, uvicorn, tensorflow 2.16.2, pillow, scipy, python-multipart

Running

python backend/main.py

Or with auto-reload:

uvicorn backend.main:app --reload --port 8000

Access at http://localhost:8000

Model Setup

The server expects a TensorFlow SavedModel at model/mnist_model/. Without it, the server starts but /predict returns HTTP 503.

To train a new model, use train/handwriting-recognizer.ipynb. The notebook is configured for Kaggle (expects data at /kaggle/input/digit-recognizer/). For local training, update the file paths.

Limitations

Single digit only: No segmentation for multiple digits
Input assumptions: Expects white strokes on dark background; the preprocessing pipeline does not invert colors
MNIST distribution: Accuracy depends on how closely the drawn digit matches MNIST style (centered, isolated)
History storage: Thumbnails stored as full-size data URLs (not compressed)
Training notebook: Hardcoded Kaggle paths require modification for local use
No authentication: API is open; not suitable for public deployment without additional security

Development Notes

Design decisions:

Vanilla JavaScript: No framework dependencies; the frontend is small enough that a framework would add complexity without benefit.
Debounced prediction: Prediction triggers 100ms after drawing stops, balancing responsiveness with avoiding excessive API calls during active drawing.
MNIST-style preprocessing: The backend replicates the preprocessing used in the original MNIST dataset (crop, resize to 20×20, center by mass in 28×28) to maximize compatibility with the trained model.
Top-5 predictions: Provides transparency into model confidence distribution, useful for understanding misclassifications.
Debug mode: Exposes preprocessing steps to help diagnose why certain digits are misclassified.

Scope:

This is a demonstration project showing end-to-end ML deployment with a browser frontend. It is not optimized for production use (no rate limiting, no model versioning, no telemetry).

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backend		backend
frontend		frontend
model/mnist_model		model/mnist_model
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiveDigit

Architecture

Project Structure

API

`POST /predict`

`GET /`

`GET /static/*`

Model

Frontend

Installation

Running

Model Setup

Limitations

Development Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

CodeNinjaSarthak/LiveDigit

Folders and files

Latest commit

History

Repository files navigation

LiveDigit

Architecture

Project Structure

API

POST /predict

GET /

GET /static/*

Model

Frontend

Installation

Running

Model Setup

Limitations

Development Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /predict`

`GET /`

`GET /static/*`

Packages