Skip to content

mohanlal99/pdf2sheet-auto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ PDF2Sheet Auto

Automated Invoice Extraction System

Transform PDF invoices into structured spreadsheet data with AI-powered extraction

Node.js React MongoDB Tailwind License


πŸ“Œ Overview

PDF2Sheet Auto is a full-stack web application that automatically extracts data from PDF invoices and converts it into structured spreadsheet records. It helps businesses eliminate manual data entry and manage invoices efficiently.

The system supports both PDF uploads and email-based ingestion, applies intelligent extraction logic, and syncs the results with spreadsheets.


πŸ”„ Workflow

Upload / Forward PDF
        ↓
PDF Parsing & Text Extraction
        ↓
Data Recognition & Mapping
        ↓
Validation & Confidence Scoring
        ↓
Spreadsheet Export
  1. Upload invoice PDF or forward it via email
  2. System parses the document
  3. Important fields are extracted
  4. Vendor mapping is applied
  5. Data is pushed to spreadsheets

✨ Features

Core Features

  • πŸ“€ PDF Upload & Email Ingestion
  • πŸ€– Intelligent Field Extraction
  • πŸ“Š Google Sheets / Excel Integration
  • πŸͺ Vendor Templates & Mapping
  • βœ… Manual Review Queue
  • πŸ” Duplicate Detection
  • πŸ“ˆ Confidence Scoring

Dashboard

  • Real-time processing statistics
  • Top vendors analytics
  • Pending & failed alerts
  • Monthly usage tracking

User Management

  • JWT Authentication
  • Role-based access
  • User preferences
  • Responsive UI


πŸ“Έ Screenshots

πŸ“Š Dashboard


πŸ“€ Upload Invoice


πŸ“„ Invoice Management


πŸͺ Vendor Mapping

πŸ› οΈ Tech Stack

Backend

Technology Purpose
Node.js Runtime
Express.js API Framework
MongoDB Database
Mongoose ODM
JWT Authentication
pdf-parse PDF Processing
Multer File Upload
Nodemailer Email Handling

Frontend

Technology Purpose
React 18 UI Framework
Vite Build Tool
Tailwind CSS Styling
React Router Routing
Zustand State Management
Axios API Client

πŸ“ Project Structure

PDF2Sheet/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ server.js
β”‚   └── src/
β”‚       β”œβ”€β”€ config/
β”‚       β”œβ”€β”€ models/
β”‚       β”œβ”€β”€ controllers/
β”‚       β”œβ”€β”€ services/
β”‚       β”œβ”€β”€ routes/
β”‚       β”œβ”€β”€ middleware/
β”‚       └── utils/
β”‚
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ index.html
β”‚   └── src/
β”‚       β”œβ”€β”€ components/
β”‚       β”œβ”€β”€ pages/
β”‚       β”œβ”€β”€ store/
β”‚       β”œβ”€β”€ services/
β”‚       └── utils/
β”‚
β”œβ”€β”€ README.md
└── LICENSE

πŸš€ Installation

Prerequisites

  • Node.js v18+
  • MongoDB v6+
  • Git
  • npm / yarn

Clone Repository

git clone https://github.com/yourusername/pdf2sheet-auto.git
cd pdf2sheet-auto

Backend Setup

cd backend
npm install
cp .env.example .env
npm run dev

Check API:

curl http://localhost:5000/health

Frontend Setup

cd frontend
npm install
echo "VITE_API_URL=http://localhost:5000/api/v1" > .env
npm run dev

Open:

http://localhost:5173

πŸ” Environment Variables

Backend (.env)

NODE_ENV=development
PORT=5000

MONGODB_URI=mongodb://localhost:27017/pdf2sheet

JWT_SECRET=your_secret_key
JWT_EXPIRE=7d

FRONTEND_URL=http://localhost:5173

UPLOAD_PATH=./uploads
MAX_FILE_SIZE=10485760

Frontend (.env)

VITE_API_URL=http://localhost:5000/api/v1

πŸ“š API Documentation

Base URL

http://localhost:5000/api/v1

Authentication

Method Endpoint Description
POST /auth/register Register
POST /auth/login Login
GET /auth/me Get Profile

Invoices

Method Endpoint Description
GET /invoices List Invoices
GET /invoices/:id Get Invoice
POST /invoices/upload Upload PDF
PUT /invoices/:id Update
DELETE /invoices/:id Delete
GET /invoices/stats Stats

Upload Example:

curl -X POST http://localhost:5000/api/v1/invoices/upload \
 -H "Authorization: Bearer TOKEN" \
 -F "pdf=@invoice.pdf"

Vendor Mapping

Method Endpoint Description
GET /vendor-maps List
POST /vendor-maps Create
PUT /vendor-maps/:id Update
DELETE /vendor-maps/:id Delete

Dashboard

Method Endpoint Description
GET /dashboard/stats Statistics
GET /dashboard/activity Activity
GET /dashboard/attention Alerts
GET /dashboard/top-vendors Vendors

πŸ—„οΈ Database Schema (Summary)

User

{
  email: String,
  password: String,
  name: String,
  company: String,
  role: "user" | "admin",
  subscription: Object
}

Invoice

{
  userId: ObjectId,
  status: String,
  pdfFile: Object,
  extractedData: Object,
  confidence: Object,
  processingTime: Number
}

VendorMap

{
  vendorName: String,
  fieldMappings: Array,
  category: String
}

🌐 Deployment

Backend (Railway / Render)

npm install
npm start

Add environment variables in dashboard.


Frontend (Vercel / Netlify)

npm run build

Deploy dist/ folder.


🐳 Docker (Optional)

Backend Dockerfile

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
EXPOSE 5000
CMD ["npm","start"]

docker-compose.yml

version: '3.8'
services:
  backend:
    build: ./backend
    ports:
      - "5000:5000"
    environment:
      - MONGODB_URI=mongodb://mongo:27017/pdf2sheet
    depends_on:
      - mongo

  mongo:
    image: mongo:6
    volumes:
      - mongo_data:/data/db

volumes:
  mongo_data:

🀝 Contributing

  1. Fork the repo
  2. Create feature branch
  3. Commit changes
  4. Push branch
  5. Open Pull Request
git checkout -b feature/new-feature
git commit -m "Add feature"
git push origin feature/new-feature

πŸ“œ License

MIT License Β© 2024 PDF2Sheet Auto


πŸ“ž Support

About

πŸš€ Full-stack application to extract data from PDF invoices and convert them into structured spreadsheets using Node.js, React, and MongoDB.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages