JobShield

AI-powered fraud detection for students. Paste any suspicious job offer, internship message, or recruitment email and get an instant risk score — backed by semantic embeddings, 11 fraud signals, a self-organizing scam cluster database, and a self-learning scoring engine.

---

The Problem
What JobShield Does
Features at a Glance
Tech Stack
System Architecture
The 11 Fraud Signals
Scoring System
Analysis Pipeline — Step by Step
Clustering & Scam Intelligence Database
Self-Learning Weight Engine
Chrome Extension
Admin Panel & The Training Loop
Project Structure
Database Collections
API Reference
Environment Variables
Installation & Setup
Atlas Vector Search Setup
Seeding the Database
Running the Project
Creating an Admin Account
Color Palette & UI Design
Known Limitations

The Problem

India reported over 11 lakh cybercrime cases in 2023, with job fraud ranking among the top three categories. The victims are overwhelmingly students and recent graduates — first-time job seekers who have no prior experience recognizing scam patterns.

The typical attack:

Student finds a listing on Internshala, LinkedIn, or receives a WhatsApp message
Recruiter asks for a "registration fee," "security deposit," or "onboarding contribution" of ₹500–₹2000
Student pays via UPI or bank transfer
Recruiter disappears

The gap: Existing tools like ScamAdviser or Google Safe Browsing check URLs and domain reputation — they don't read the actual content of the message. A scammer using a Gmail address and WhatsApp leaves no URL to check.

JobShield was built for exactly this scenario.

What JobShield Does

JobShield analyzes the raw text of any job offer, internship message, or recruitment communication and returns:

A 0–100 fraud risk score
A human-readable explanation of every signal that fired
A classification: Low Risk / Suspicious / High Risk / Likely Scam

There are two core flows:

Flow	Endpoint	Saves to DB?	Purpose
Check	`POST /api/check`	No	Quick ephemeral scan — paste and check
Report	`POST /api/reports`	Yes	Submit a scam — adds to database, improves the system

Features at a Glance

Feature	Description
Semantic similarity	384-dim embeddings catch scam variants and paraphrases — not just exact matches
11 fraud signals	Across 4 categories: semantic, linguistic, identity, and infrastructure
Self-organizing clusters	Every report automatically merges into, attaches to, or creates a scam cluster
Self-learning weights	Logistic regression trains on admin-verified reports every 6 hours
Chrome extension	Auto-extracts text from Gmail and Internshala, shows result inline
NER model	Detects brand impersonation and domain-org mismatches using BERT-NER
Admin pipeline	Full review queue, cluster verification, reputation scoring, manual retraining
Combination bonus	Multiple weak signals together trigger a compound score boost
Floor boosts	Confirmed scam matches always produce "Likely Scam" regardless of other signals
Graceful degradation	Every external API call fails gracefully — the pipeline always completes

Tech Stack

Layer	Technology
Frontend	React 18, Vite
Backend	Node.js 20, Express, ES Modules
Database	MongoDB Atlas (M0 free tier)
Vector Search	Atlas Vector Search — ANN cosine similarity
Embedding Model	`sentence-transformers/all-MiniLM-L6-v2` (384-dim) via HuggingFace
NER Model	`dslim/bert-base-NER` via HuggingFace
Inference API	HuggingFace Inference API (`router.huggingface.co`)
Domain Intelligence	WhoisXML API (with MongoDB cache)
Authentication	JWT (`jsonwebtoken` + `bcryptjs`)
Chrome Extension	Manifest V3, content scripts, service worker
ML Engine	Logistic regression implemented from scratch in Node.js — no ML libraries

The 11 Fraud Signals

Signals are divided into four categories. Each carries a default weight — once enough admin-verified data is collected, the logistic regression engine learns better weights from real data.

Category 1 — Semantic Similarity

These two signals are mutually exclusive — only the higher-confidence one fires.

#	Signal	Default Weight	Trigger
1	Confirmed scam match	35	Cosine similarity ≥ 0.95 against a verified cluster
2	High similarity match	25	Cosine similarity 0.85–0.95 against any cluster

Signal 1 also applies a floor boost — any text triggering it scores a minimum of 72/100 (Likely Scam), regardless of other signals.

Category 2 — Identity & Domain

#	Signal	Default Weight	Trigger
3	Domain mismatch	20	NER detects an ORG name that doesn't match the sender's domain
4	Young domain	18	WHOIS shows domain registered < 90 days ago
6	Big brand impersonation	15	NER detects PayPal, Amazon, Google, HDFC, Flipkart, Paytm etc.

Category 3 — Linguistic & Communication

#	Signal	Default Weight	Trigger
5	Payment language	22	Regex detects: registration fee, UPI payment, advance deposit, security deposit etc.
9	Telegram present	14	Regex detects t.me/ link or @handle (extremely common in Indian job scams)
8	Free email provider	12	Sender email is Gmail, Yahoo, Outlook, Rediffmail, Hotmail etc.
11	Urgency detected	10 × urgencyScore	Keywords: "urgent", "act now", "expires today", "limited seats" etc.

Category 4 — Infrastructure

#	Signal	Default Weight	Trigger
7	Suspicious TLD	12	Domain ends in .xyz, .top, .click, .tk, .loan, .win, .bid etc.
10	Previously reported	16	Same domain or cluster pattern reported 3+ times

Combination Bonus

When 3 or more weak signals fire together, a compound bonus is applied:

Weak signals firing	Bonus added
3	+15
4	+20
5	+25

Weak signals counted: freeEmailProvider, telegramPresent, urgencyDetected, suspiciousTLD, paymentLanguage.

This reflects that multiple weak signals together are significantly more suspicious than their individual weights suggest. A message from a Gmail address, asking for UPI payment, with a Telegram link, and urgency language is a textbook Indian job scam — even if each signal alone seems minor.

Scoring System

Step 1 — Evaluate all 11 signals
         Sum triggered weights → rawScore

Step 2 — Add combination bonus (if 3+ weak signals fired)

Step 3 — Normalize against theoretical maximum
         maxPossible = sum of ALL weights if every signal fired
         normalizedScore = (rawScore / maxPossible) × 100

Step 4 — Apply floor boosts
         confirmedScamMatch           → minimum score 72  (Likely Scam)
         highSimilarityMatch          → minimum score 55  (High Risk)
         paymentLanguage + telegram   → minimum score 60  (High Risk)
         paymentLanguage + freeEmail  → minimum score 60  (High Risk)

Step 5 — Clamp to 0–100

Risk Classification

Score Range	Classification	Meaning
0 – 25	Low Risk	No significant signals — appears legitimate
26 – 50	Suspicious	Some signals fired — proceed with caution
51 – 70	High Risk	Multiple strong signals — likely fraudulent
71 – 100	Likely Scam	High confidence fraud — do not engage

Analysis Pipeline — Step by Step

Every request to /api/check or /api/reports runs the same pipeline in analysisService.js.

Step 1 — Generate Embedding (with cache)

Text is hashed with SHA-256. The hash is looked up in EmbeddingCache (MongoDB, 7-day TTL auto-expiry). On a cache miss, the text is sent to HuggingFace:

Model:  sentence-transformers/all-MiniLM-L6-v2
Output: float[384] — a vector representing the semantic meaning of the text

This is the foundation of the entire similarity system. Two messages that mean the same thing — even with completely different words — will have numerically similar vectors. The embedding is reused in Step 3 for Atlas Vector Search.

Step 2 — Parallel Batch 1 (Promise.all)

Three tasks run simultaneously:

Regex Extraction (regexService.js) — Synchronous, no external calls, instant. Scans text with compiled regex patterns and extracts:

Payment language (UPI, registration fee, advance deposit, bank transfer, etc.)
Telegram links and @handles
Email addresses and their domains (checks against free provider list)
URLs and their TLDs (checks against suspicious TLD list)
Urgency keywords (normalized to 0–1 score: triggered / 3, capped at 1)
Primary domain (first URL domain found, or email domain)

NER Model (nerService.js) — Calls dslim/bert-base-NER via HuggingFace Inference API. Extracts ORG and PER entity groups. Checks ORGs against a hardcoded list of major brands. Fails gracefully — NER failure skips signals 3 and 6 without crashing the pipeline.

Domain Mismatch (regexService.js) — Uses NER ORG entities and the primary domain extracted by regex. If any detected ORG name does not appear in the sender's domain string, domainMismatch = true. A message claiming to be from "Google" but using a domain like jobs-portal.com triggers this signal.

Step 3 — Parallel Batch 2 (Promise.all)

WHOIS Lookup (domainService.js) — Checks the DomainIntelligence MongoDB collection first. On a cache miss, calls the WHOIS API, stores the result with domain age in days and registrar name. Fails gracefully — WHOIS failure skips signal 4.

Atlas Vector Search (similarityService.js) — Queries the ScamClusters collection using the embedding from Step 1. Returns the 5 nearest clusters by cosine similarity. The closest match is evaluated against the 0.95 and 0.85 thresholds for signals 1 and 2. Also counts total reports matching the same domain for signal 10.

Step 4 — Weight Lookup

weightService.js returns current weights from memory. No DB call, no API call. Under 1ms.

< 20 labelled reports: Returns calibrated hardcoded defaults
20+ labelled reports: Returns logistic regression weights, retrained every 6 hours
Blend ratio: 20% learned / 80% defaults at 20 samples → 80% learned / 20% defaults at 150+ samples

Step 5 — Score Calculation

scoringService.js applies weights to triggered signals, computes the combination bonus, normalizes against the theoretical maximum, and applies floor boosts.

Step 6 — Explanation

explanationBuilder.js maps triggered signal flags to an ordered array of plain-English sentences. This is what users see — a clear explanation of exactly why a message scored the way it did.

Clustering & Scam Intelligence Database

Every report submitted via /api/reports is routed through clusterService.js, which compares its embedding against all existing scam clusters and makes one of three decisions.

The Three Decisions

New report embedding arrives
          │
          ▼
Atlas Vector Search → find closest existing cluster
          │
    Compare similarityScore of best match
          │
    ┌─────┴─────────────────────────┬──────────────────────────┐
    │                               │                          │
  ≥ 0.95                       0.85 – 0.95                 < 0.85
    │                               │                          │
    ▼                               ▼                          ▼
  MERGE                          ATTACH                     CREATE
Update centroid               reportCount++            New cluster born
 running average              lastReportedAt           from this embedding
reportCount++                 Centroid UNCHANGED       reportCount = 1
averageRiskScore updated      Prevents drift           verified = false
Signal 1 fires (+35)          Signal 2 fires (+25)     No signal fires

Centroid Running Average

When a report merges (≥ 0.95 similarity), the cluster centroid shifts toward the new embedding:

newCentroid[i] = (oldCentroid[i] × (count - 1) + newEmbedding[i]) / count

Over time the centroid represents the true semantic center of all scams in that cluster, making future similarity searches more accurate. The centroid is intentionally not updated for ATTACH operations (0.85–0.95) — this prevents variant wordings from drifting a cluster away from its original identity.

Why This Matters

The first time a new scam template appears, it creates a cluster with verified: false. The second time a similar message arrives, it merges in. By the third or fourth report, an admin can review and set verified: true — which activates Signal 1 (+35, floor boost to 72 minimum) for every future similar message.

The database is self-organizing and self-improving. The 500th report benefits from the intelligence of all 499 before it.

Self-Learning Weight Engine

weightService.js implements logistic regression from scratch in Node.js — no Python, no ML libraries, no GPU, no external dependencies.

How It Works

Admin marks reports as "verified-scam" or "rejected"
                    │
                    ▼
Every 6 hours (or POST /api/admin/retrain):

1. Pull all verified-scam reports  → label 1
   Pull all rejected reports       → label 0

2. For each report, build 11-dimensional feature vector:
   [confirmedScamMatch, highSimilarityMatch, domainMismatch,
    youngDomain, paymentLanguage, bigBrandMentioned,
    suspiciousTLD, freeEmailProvider, telegramPresent,
    previouslyReported, urgencyScore]

3. Run gradient descent (1000 epochs, lr=0.05):
   Minimise binary cross-entropy loss
   Find weights w[] that best separate scam (1) from clean (0)

4. Scale coefficients → integer scoring weights summing to ~170

5. Apply dynamic blend with calibrated defaults:
   < 50 samples    →  20% learned + 80% defaults
   50–150 samples  →  50% learned + 50% defaults
   150+ samples    →  80% learned + 20% defaults

6. Store in memory → all subsequent requests use these weights

What the Engine Learns

Signals that consistently appear in verified scam reports — but not in rejected (clean) reports — receive higher weights. Signals that appear in both get lower weights. The scoring system adapts to the actual distribution of scams in your dataset, not a theoretical assumption.

Example output after training:

   Learned weights active:
   confirmedScamMatch    : 38  (default 35  ↑+3)
   telegramPresent       : 19  (default 14  ↑+5)
   freeEmailProvider     : 9   (default 12  ↓-3)
   paymentLanguage       : 26  (default 22  ↑+4)

Bootstrapping Non-Scam Training Data

The harder side of building training data is collecting non-scam examples — most users only submit messages they're suspicious of. On server startup, seedCleanExamples() automatically labels any pending reports that scored under 20 with no payment/telegram/brand signals as rejected, bootstrapping the non-scam training set without any manual work.

Chrome Extension

The Chrome extension brings JobShield to the point of attack — inside Gmail and Internshala — without asking the user to copy and paste anything.

Architecture (Manifest V3)

popup.html / popup.js    ← Login screen + analysis screen
background.js            ← Service worker: handles all API calls
content.js               ← Injected into Gmail + Internshala pages
config.js                ← Single source of truth for API and app URLs
icons/                   ← icon16.png, icon48.png, icon128.png

The Flow

User opens a suspicious email in Gmail or a job listing on Internshala
Clicks the JobShield icon in the Chrome toolbar
Clicks "Auto-extract text from this page" — content.js reads the DOM
Clicks "Analyse" — background.js calls POST /api/check
Risk score and explanation appear in the popup
A floating badge overlays the page itself, showing the result in context

Text Extraction Logic

Site	Primary Selectors	Fallback
Gmail	`.a3s.aiL`, `.a3s`, `[data-message-id]`	`[role="textbox"]`
Internshala	`.internship_details`, `.job-detail-section`, `#internship_detail`, `.detail_view`	10+ additional selectors, then full-page text scraper
Generic	`main`, `article`, `[role="main"]`, `.content`	`body` (first 5000 chars)

Key Implementation Details

CSP compliance: No inline onclick handlers anywhere — all events wired via addEventListener. Required by Manifest V3's strict Content Security Policy.
Double injection guard: window.__jobshieldInjected prevents the content script from re-registering listeners if injected multiple times.
Graceful injection errors: ensureContentScript() wraps injection in try/catch. The popup always renders even if the page disallows injection.
JWT persistence: Stored in chrome.storage.local — survives browser restarts.
Timeout safety: All chrome.tabs.sendMessage calls are protected against missing content script with proper error handling.

Loading the Extension Locally

Open Chrome → chrome://extensions
Enable Developer mode (top-right toggle)
Click Load unpacked → select the extension/ folder
JobShield icon appears in the toolbar

After any code change: click the refresh icon on the extension card in chrome://extensions.

Admin Panel & The Training Loop

The admin panel is not just a moderation dashboard — it is the mechanism that trains the entire scoring system.

Admin Capabilities

Action	Effect on system
View all reports	Filter by status, classification, date, sort by any field
Mark `verified-scam`	`user.verifiedReports++` · `reputationScore +10` · `cluster.verified = true`
Mark `rejected`	`user.rejectedReports++` · `reputationScore -5`
Verify a cluster	`cluster.verified = true` — activates Signal 1 (+35) for ALL future similar messages
Unverify a cluster	Deactivates Signal 1 for that cluster
Force retrain	`POST /api/admin/retrain` — immediately reruns logistic regression

The Flywheel

User submits suspicious message
          │
          ▼
Pipeline runs → report saved
High-risk reports auto-flagged (score > 70) → enter admin queue
          │
          ▼
Admin reviews → marks verified-scam or rejected
          │
          ├── Cluster verified
          │     → Signal 1 activates for all future similar messages
          │     → Protects every future user who sends a similar scam
          │
          └── 20+ verified + 20+ rejected reports reached
                → Logistic regression retrains (every 6h or on demand)
                → Weights shift to reflect real data
                → Scoring improves for all future analyses
                → More reports correctly classified
                → Admin efficiency improves
                → Loop continues

Every admin review action makes the entire system more accurate for every future user.

Reputation System

Users accumulate a reputationScore based on report quality:

+10 for each report admin-confirms as a real scam
-5 for each report admin-rejects as not a scam

High-reputation users are more trustworthy community contributors. This surfaces over time in the admin panel.

Database Collections

`users`

{
  email:            String,   // unique, indexed
  password:         String,   // bcrypt hashed, never returned in responses
  role:             String,   // "user" | "admin"
  reputationScore:  Number,   // starts 0, +10 verified / -5 rejected
  totalReports:     Number,
  verifiedReports:  Number,
  rejectedReports:  Number,
  createdAt:        Date
}

`reports`

{
  userId:            ObjectId,   // ref: User, indexed
  rawText:           String,
  textHash:          String,     // SHA-256, indexed
  embedding:         [Number],   // 384-dim (excluded from list endpoints)
  structuredSignals: {
    paymentLanguage:    Boolean,
    domainMismatch:     Boolean,
    bigBrandMentioned:  Boolean,
    suspiciousTLD:      Boolean,
    freeEmailProvider:  Boolean,
    telegramPresent:    Boolean,
    previouslyReported: Boolean,
    urgencyScore:       Number,  // used as continuous feature in weight learning
  },
  domain:            String,
  domainAgeDays:     Number,
  registrar:         String,
  similarityScore:   Number,
  riskScore:         Number,
  classification:    String,     // enum: Low Risk | Suspicious | High Risk | Likely Scam
  explanation:       [String],
  clusterId:         ObjectId,   // ref: ScamCluster
  status:            String,     // pending | auto-flagged | verified-scam | rejected
  location:          String,
  paymentMethod:     String,
  createdAt:         Date
}

`scamclusters`

{
  clusterEmbedding:   [Number],  // 384-dim centroid — running average of merged embeddings
  representativeText: String,    // most recent merged text (truncated to 500 chars)
  reportCount:        Number,    // total reports merged or attached
  averageRiskScore:   Number,
  verified:           Boolean,   // true activates Signal 1 (+35) for future matches
  dominantDomain:     String,
  dominantBrand:      String,
  firstReportedAt:    Date,
  lastReportedAt:     Date
}

`embeddingcaches`

{
  textHash:  String,    // SHA-256, unique
  embedding: [Number],  // 384-dim
  createdAt: Date       // TTL index: auto-deleted after 7 days
}

`domainintelligences`

{
  domain:    String,  // unique
  ageDays:   Number,
  registrar: String,
  flagCount: Number,  // increments on each report referencing this domain
  createdAt: Date
}

API Reference

Authentication

Method	Endpoint	Auth	Body	Response
POST	`/api/auth/register`	None	`{ email, password }`	`{ token, user }`
POST	`/api/auth/login`	None	`{ email, password }`	`{ token, user }`
GET	`/api/auth/me`	JWT	—	`{ user }`

Analysis

Method	Endpoint	Auth	Body	Response
POST	`/api/check`	JWT	`{ text }`	`{ riskScore, classification, explanation[], signals{} }`
POST	`/api/reports`	JWT	`{ text, location?, paymentMethod? }`	`{ reportId, riskScore, classification, explanation[], status }`

Reports

Method	Endpoint	Auth	Query Params	Description
GET	`/api/reports`	JWT	`?page=1&limit=10`	Own report history (paginated)
GET	`/api/reports/:id`	JWT	—	Single report detail

Dashboard

Method	Endpoint	Auth	Description
GET	`/api/dashboard/stats`	JWT	Personal stats, recent reports, classification breakdown

Admin

Method	Endpoint	Auth	Body / Query	Description
GET	`/api/admin/reports`	JWT + Admin	`?status=&classification=&page=&sortBy=&order=`	All reports with filters
GET	`/api/admin/reports/:id`	JWT + Admin	—	Single report detail
PATCH	`/api/admin/reports/:id`	JWT + Admin	`{ status }`	Update status — triggers user reputation adjustment
GET	`/api/admin/clusters`	JWT + Admin	`?verified=true&page=`	All clusters
PATCH	`/api/admin/clusters/:id`	JWT + Admin	`{ verified }`	Verify/unverify cluster
POST	`/api/admin/retrain`	JWT + Admin	—	Force immediate logistic regression retraining

Example Response — `POST /api/check`

{
  "riskScore": 85,
  "classification": "Likely Scam",
  "explanation": [
    "Matches a verified scam template (99.4% similarity)",
    "Message requests payment, deposit, or fee upfront",
    "Sender domain doesn't match the organisation named in the message",
    "This pattern has been reported 7 times by other users",
    "Multiple scam signals detected together — elevated risk"
  ],
  "signals": {
    "confirmedScamMatch":  true,
    "highSimilarityMatch": false,
    "similarityScore":     0.994,
    "paymentLanguage":     false,
    "domainMismatch":      true,
    "domainAgeDays":       2573,
    "bigBrandMentioned":   false,
    "suspiciousTLD":       false,
    "freeEmailProvider":   false,
    "telegramPresent":     false,
    "previouslyReported":  true,
    "urgencyDetected":     false,
    "urgencyScore":        0
  },
  "cached": true
}

Environment Variables

`server/.env`

# MongoDB Atlas
MONGODB_URI=mongodb+srv://<user>:<password>@<cluster>.mongodb.net/<dbname>

# JWT
JWT_SECRET=your_strong_secret_here
JWT_EXPIRES_IN=7d

# HuggingFace — free account at huggingface.co → Settings → Access Tokens
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxxxxxxxxx
HF_NER_MODEL=dslim/bert-base-NER

# WHOIS — free account at whoisxmlapi.com (500 queries/month free)
WHOIS_API_KEY=your_whois_api_key

# Similarity thresholds (these are the defaults — omit to use defaults)
SIMILARITY_CONFIRMED_THRESHOLD=0.95
SIMILARITY_HIGH_THRESHOLD=0.85

# Server
PORT=5000
CLIENT_URL=http://localhost:5173

`extension/config.js`

const JOBSHIELD_CONFIG = {
  API_BASE_URL: 'http://localhost:5000/api',   // → your deployed API URL for production
  APP_URL:      'http://localhost:5173',        // → your deployed app URL for production
}

Installation & Setup

Prerequisites

Node.js 18+
A MongoDB Atlas account — free M0 tier is sufficient
A HuggingFace account — free, for the Inference API key
A WhoisXML API account — free tier gives 500 queries/month

1. Clone the repository

git clone https://github.com/yourusername/jobshield.git
cd jobshield

2. Install dependencies

cd server && npm install
cd ../client && npm install

3. Configure environment variables

cd server
cp .env.example .env
# Edit .env and fill in all required values

4. Create Atlas Vector Search indexes

See Atlas Vector Search Setup below — required before signals 1 and 2 work.

5. Seed the database (recommended)

See Seeding the Database below — populates 200 verified scam clusters so similarity signals work from day one.

6. Start both servers

# Terminal 1 — backend
cd server && npm run dev

# Terminal 2 — frontend
cd client && npm run dev

Atlas Vector Search Setup

JobShield requires one mandatory vector search index (and one optional). These must be created manually in the Atlas UI — they cannot be created programmatically on the M0 free tier.

Required — ScamClusters index

Go to cloud.mongodb.com → your cluster → Atlas Search tab
Click Create Search Index → select Atlas Vector Search (not Atlas Full Text Search)
Select your database → scamclusters collection
Replace the default JSON with:

{
  "fields": [
    {
      "type": "vector",
      "path": "clusterEmbedding",
      "numDimensions": 384,
      "similarity": "cosine"
    }
  ]
}

Name the index exactly: cluster_vector_index
Click Create Search Index → wait for status Active (1–2 minutes)

Optional — Reports index

Same process on the reports collection, field embedding, name report_vector_index.

Verify it's working

Submit the same scam message twice. Your server terminal should show:

Vector search returned 5 results
   Best match: 69a28b..., score: 0.9987, verified: false
   Signal 2 fired (+20)
Merged into cluster 69a28b... (similarity: 0.999, count: 2)

Seeding the Database

Without seed data, signals 1 and 2 will never fire because ScamClusters is empty. The seed script pre-populates it with 200 verified scam clusters from Kaggle's SMS Spam Collection dataset.

Step 1 — Download the dataset from Kaggle: https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-prediction

Step 2 — Place the CSV at:

server/scripts/fake_job_posting.csv

Step 3 — Run the seed script:

cd server
npm run seed

The script embeds the 860 fake jobs via HuggingFace and inserts them as verified: true clusters. Takes 5–10 minutes due to HF API rate limiting (200ms delay between requests). Progress is printed to the terminal.

Running the Project

# Backend (http://localhost:5000)
cd server
npm run dev      # nodemon hot-reload
npm start        # no hot-reload

# Frontend (http://localhost:5173)
cd client
npm run dev      # Vite dev server
npm run build    # production build
npm run preview  # preview production build locally

# Database seeding
cd server
npm run seed     # import Kaggle spam dataset into ScamClusters

Creating an Admin Account

All accounts register as role: "user" by default. To promote an account to admin:

MongoDB Atlas UI (Collections → users) or mongosh:

db.users.updateOne(
  { email: "your@email.com" },
  { $set: { role: "admin" } }
)

Log out and log back in for the role change to apply. Admin users unlock the /admin/reports and /admin/clusters pages in the frontend, and the admin API endpoints.

The most impactful admin action: Verifying a cluster. Setting verified: true on a cluster activates Signal 1 (+35 points, minimum score 72) for every future message that semantically matches it. A few minutes of admin review in the early days can protect thousands of future users.

Built with Node.js · React · MongoDB Atlas · HuggingFace Inference API · Chrome Extensions Manifest V3

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
client		client
extension		extension
server		server
.gitignore		.gitignore
README.md		README.md
jobshield-flowcharts.html		jobshield-flowcharts.html

Folders and files

Latest commit

History

Repository files navigation