Skip to content

thecoderr13/JobShield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JobShield

AI-powered fraud detection for students. Paste any suspicious job offer, internship message, or recruitment email and get an instant risk score — backed by semantic embeddings, 11 fraud signals, a self-organizing scam cluster database, and a self-learning scoring engine.


---

Table of Contents


The Problem

India reported over 11 lakh cybercrime cases in 2023, with job fraud ranking among the top three categories. The victims are overwhelmingly students and recent graduates — first-time job seekers who have no prior experience recognizing scam patterns.

The typical attack:

  1. Student finds a listing on Internshala, LinkedIn, or receives a WhatsApp message
  2. Recruiter asks for a "registration fee," "security deposit," or "onboarding contribution" of ₹500–₹2000
  3. Student pays via UPI or bank transfer
  4. Recruiter disappears

The gap: Existing tools like ScamAdviser or Google Safe Browsing check URLs and domain reputation — they don't read the actual content of the message. A scammer using a Gmail address and WhatsApp leaves no URL to check.

JobShield was built for exactly this scenario.


What JobShield Does

JobShield analyzes the raw text of any job offer, internship message, or recruitment communication and returns:

  • A 0–100 fraud risk score
  • A human-readable explanation of every signal that fired
  • A classification: Low Risk / Suspicious / High Risk / Likely Scam

There are two core flows:

Flow Endpoint Saves to DB? Purpose
Check POST /api/check No Quick ephemeral scan — paste and check
Report POST /api/reports Yes Submit a scam — adds to database, improves the system

Features at a Glance

Feature Description
Semantic similarity 384-dim embeddings catch scam variants and paraphrases — not just exact matches
11 fraud signals Across 4 categories: semantic, linguistic, identity, and infrastructure
Self-organizing clusters Every report automatically merges into, attaches to, or creates a scam cluster
Self-learning weights Logistic regression trains on admin-verified reports every 6 hours
Chrome extension Auto-extracts text from Gmail and Internshala, shows result inline
NER model Detects brand impersonation and domain-org mismatches using BERT-NER
Admin pipeline Full review queue, cluster verification, reputation scoring, manual retraining
Combination bonus Multiple weak signals together trigger a compound score boost
Floor boosts Confirmed scam matches always produce "Likely Scam" regardless of other signals
Graceful degradation Every external API call fails gracefully — the pipeline always completes

Tech Stack

Layer Technology
Frontend React 18, Vite
Backend Node.js 20, Express, ES Modules
Database MongoDB Atlas (M0 free tier)
Vector Search Atlas Vector Search — ANN cosine similarity
Embedding Model sentence-transformers/all-MiniLM-L6-v2 (384-dim) via HuggingFace
NER Model dslim/bert-base-NER via HuggingFace
Inference API HuggingFace Inference API (router.huggingface.co)
Domain Intelligence WhoisXML API (with MongoDB cache)
Authentication JWT (jsonwebtoken + bcryptjs)
Chrome Extension Manifest V3, content scripts, service worker
ML Engine Logistic regression implemented from scratch in Node.js — no ML libraries

The 11 Fraud Signals

Signals are divided into four categories. Each carries a default weight — once enough admin-verified data is collected, the logistic regression engine learns better weights from real data.

Category 1 — Semantic Similarity

These two signals are mutually exclusive — only the higher-confidence one fires.

# Signal Default Weight Trigger
1 Confirmed scam match 35 Cosine similarity ≥ 0.95 against a verified cluster
2 High similarity match 25 Cosine similarity 0.85–0.95 against any cluster

Signal 1 also applies a floor boost — any text triggering it scores a minimum of 72/100 (Likely Scam), regardless of other signals.

Category 2 — Identity & Domain

# Signal Default Weight Trigger
3 Domain mismatch 20 NER detects an ORG name that doesn't match the sender's domain
4 Young domain 18 WHOIS shows domain registered < 90 days ago
6 Big brand impersonation 15 NER detects PayPal, Amazon, Google, HDFC, Flipkart, Paytm etc.

Category 3 — Linguistic & Communication

# Signal Default Weight Trigger
5 Payment language 22 Regex detects: registration fee, UPI payment, advance deposit, security deposit etc.
9 Telegram present 14 Regex detects t.me/ link or @handle (extremely common in Indian job scams)
8 Free email provider 12 Sender email is Gmail, Yahoo, Outlook, Rediffmail, Hotmail etc.
11 Urgency detected 10 × urgencyScore Keywords: "urgent", "act now", "expires today", "limited seats" etc.

Category 4 — Infrastructure

# Signal Default Weight Trigger
7 Suspicious TLD 12 Domain ends in .xyz, .top, .click, .tk, .loan, .win, .bid etc.
10 Previously reported 16 Same domain or cluster pattern reported 3+ times

Combination Bonus

When 3 or more weak signals fire together, a compound bonus is applied:

Weak signals firing Bonus added
3 +15
4 +20
5 +25

Weak signals counted: freeEmailProvider, telegramPresent, urgencyDetected, suspiciousTLD, paymentLanguage.

This reflects that multiple weak signals together are significantly more suspicious than their individual weights suggest. A message from a Gmail address, asking for UPI payment, with a Telegram link, and urgency language is a textbook Indian job scam — even if each signal alone seems minor.


Scoring System

Step 1 — Evaluate all 11 signals
         Sum triggered weights → rawScore

Step 2 — Add combination bonus (if 3+ weak signals fired)

Step 3 — Normalize against theoretical maximum
         maxPossible = sum of ALL weights if every signal fired
         normalizedScore = (rawScore / maxPossible) × 100

Step 4 — Apply floor boosts
         confirmedScamMatch           → minimum score 72  (Likely Scam)
         highSimilarityMatch          → minimum score 55  (High Risk)
         paymentLanguage + telegram   → minimum score 60  (High Risk)
         paymentLanguage + freeEmail  → minimum score 60  (High Risk)

Step 5 — Clamp to 0–100

Risk Classification

Score Range Classification Meaning
0 – 25 Low Risk No significant signals — appears legitimate
26 – 50 Suspicious Some signals fired — proceed with caution
51 – 70 High Risk Multiple strong signals — likely fraudulent
71 – 100 Likely Scam High confidence fraud — do not engage

Analysis Pipeline — Step by Step

Every request to /api/check or /api/reports runs the same pipeline in analysisService.js.

Step 1 — Generate Embedding (with cache)

Text is hashed with SHA-256. The hash is looked up in EmbeddingCache (MongoDB, 7-day TTL auto-expiry). On a cache miss, the text is sent to HuggingFace:

Model:  sentence-transformers/all-MiniLM-L6-v2
Output: float[384] — a vector representing the semantic meaning of the text

This is the foundation of the entire similarity system. Two messages that mean the same thing — even with completely different words — will have numerically similar vectors. The embedding is reused in Step 3 for Atlas Vector Search.

Step 2 — Parallel Batch 1 (Promise.all)

Three tasks run simultaneously:

Regex Extraction (regexService.js) — Synchronous, no external calls, instant. Scans text with compiled regex patterns and extracts:

  • Payment language (UPI, registration fee, advance deposit, bank transfer, etc.)
  • Telegram links and @handles
  • Email addresses and their domains (checks against free provider list)
  • URLs and their TLDs (checks against suspicious TLD list)
  • Urgency keywords (normalized to 0–1 score: triggered / 3, capped at 1)
  • Primary domain (first URL domain found, or email domain)

NER Model (nerService.js) — Calls dslim/bert-base-NER via HuggingFace Inference API. Extracts ORG and PER entity groups. Checks ORGs against a hardcoded list of major brands. Fails gracefully — NER failure skips signals 3 and 6 without crashing the pipeline.

Domain Mismatch (regexService.js) — Uses NER ORG entities and the primary domain extracted by regex. If any detected ORG name does not appear in the sender's domain string, domainMismatch = true. A message claiming to be from "Google" but using a domain like jobs-portal.com triggers this signal.

Step 3 — Parallel Batch 2 (Promise.all)

WHOIS Lookup (domainService.js) — Checks the DomainIntelligence MongoDB collection first. On a cache miss, calls the WHOIS API, stores the result with domain age in days and registrar name. Fails gracefully — WHOIS failure skips signal 4.

Atlas Vector Search (similarityService.js) — Queries the ScamClusters collection using the embedding from Step 1. Returns the 5 nearest clusters by cosine similarity. The closest match is evaluated against the 0.95 and 0.85 thresholds for signals 1 and 2. Also counts total reports matching the same domain for signal 10.

Step 4 — Weight Lookup

weightService.js returns current weights from memory. No DB call, no API call. Under 1ms.

  • < 20 labelled reports: Returns calibrated hardcoded defaults
  • 20+ labelled reports: Returns logistic regression weights, retrained every 6 hours
  • Blend ratio: 20% learned / 80% defaults at 20 samples → 80% learned / 20% defaults at 150+ samples

Step 5 — Score Calculation

scoringService.js applies weights to triggered signals, computes the combination bonus, normalizes against the theoretical maximum, and applies floor boosts.

Step 6 — Explanation

explanationBuilder.js maps triggered signal flags to an ordered array of plain-English sentences. This is what users see — a clear explanation of exactly why a message scored the way it did.


Clustering & Scam Intelligence Database

Every report submitted via /api/reports is routed through clusterService.js, which compares its embedding against all existing scam clusters and makes one of three decisions.

The Three Decisions

New report embedding arrives
          │
          ▼
Atlas Vector Search → find closest existing cluster
          │
    Compare similarityScore of best match
          │
    ┌─────┴─────────────────────────┬──────────────────────────┐
    │                               │                          │
  ≥ 0.95                       0.85 – 0.95                 < 0.85
    │                               │                          │
    ▼                               ▼                          ▼
  MERGE                          ATTACH                     CREATE
Update centroid               reportCount++            New cluster born
 running average              lastReportedAt           from this embedding
reportCount++                 Centroid UNCHANGED       reportCount = 1
averageRiskScore updated      Prevents drift           verified = false
Signal 1 fires (+35)          Signal 2 fires (+25)     No signal fires

Centroid Running Average

When a report merges (≥ 0.95 similarity), the cluster centroid shifts toward the new embedding:

newCentroid[i] = (oldCentroid[i] × (count - 1) + newEmbedding[i]) / count

Over time the centroid represents the true semantic center of all scams in that cluster, making future similarity searches more accurate. The centroid is intentionally not updated for ATTACH operations (0.85–0.95) — this prevents variant wordings from drifting a cluster away from its original identity.

Why This Matters

The first time a new scam template appears, it creates a cluster with verified: false. The second time a similar message arrives, it merges in. By the third or fourth report, an admin can review and set verified: true — which activates Signal 1 (+35, floor boost to 72 minimum) for every future similar message.

The database is self-organizing and self-improving. The 500th report benefits from the intelligence of all 499 before it.


Self-Learning Weight Engine

weightService.js implements logistic regression from scratch in Node.js — no Python, no ML libraries, no GPU, no external dependencies.

How It Works

Admin marks reports as "verified-scam" or "rejected"
                    │
                    ▼
Every 6 hours (or POST /api/admin/retrain):

1. Pull all verified-scam reports  → label 1
   Pull all rejected reports       → label 0

2. For each report, build 11-dimensional feature vector:
   [confirmedScamMatch, highSimilarityMatch, domainMismatch,
    youngDomain, paymentLanguage, bigBrandMentioned,
    suspiciousTLD, freeEmailProvider, telegramPresent,
    previouslyReported, urgencyScore]

3. Run gradient descent (1000 epochs, lr=0.05):
   Minimise binary cross-entropy loss
   Find weights w[] that best separate scam (1) from clean (0)

4. Scale coefficients → integer scoring weights summing to ~170

5. Apply dynamic blend with calibrated defaults:
   < 50 samples    →  20% learned + 80% defaults
   50–150 samples  →  50% learned + 50% defaults
   150+ samples    →  80% learned + 20% defaults

6. Store in memory → all subsequent requests use these weights

What the Engine Learns

Signals that consistently appear in verified scam reports — but not in rejected (clean) reports — receive higher weights. Signals that appear in both get lower weights. The scoring system adapts to the actual distribution of scams in your dataset, not a theoretical assumption.

Example output after training:

   Learned weights active:
   confirmedScamMatch    : 38  (default 35  ↑+3)
   telegramPresent       : 19  (default 14  ↑+5)
   freeEmailProvider     : 9   (default 12  ↓-3)
   paymentLanguage       : 26  (default 22  ↑+4)

Bootstrapping Non-Scam Training Data

The harder side of building training data is collecting non-scam examples — most users only submit messages they're suspicious of. On server startup, seedCleanExamples() automatically labels any pending reports that scored under 20 with no payment/telegram/brand signals as rejected, bootstrapping the non-scam training set without any manual work.


Chrome Extension

The Chrome extension brings JobShield to the point of attack — inside Gmail and Internshala — without asking the user to copy and paste anything.

Architecture (Manifest V3)

popup.html / popup.js    ← Login screen + analysis screen
background.js            ← Service worker: handles all API calls
content.js               ← Injected into Gmail + Internshala pages
config.js                ← Single source of truth for API and app URLs
icons/                   ← icon16.png, icon48.png, icon128.png

The Flow

  1. User opens a suspicious email in Gmail or a job listing on Internshala
  2. Clicks the JobShield icon in the Chrome toolbar
  3. Clicks "Auto-extract text from this page" — content.js reads the DOM
  4. Clicks "Analyse" — background.js calls POST /api/check
  5. Risk score and explanation appear in the popup
  6. A floating badge overlays the page itself, showing the result in context

Text Extraction Logic

Site Primary Selectors Fallback
Gmail .a3s.aiL, .a3s, [data-message-id] [role="textbox"]
Internshala .internship_details, .job-detail-section, #internship_detail, .detail_view 10+ additional selectors, then full-page text scraper
Generic main, article, [role="main"], .content body (first 5000 chars)

Key Implementation Details

  • CSP compliance: No inline onclick handlers anywhere — all events wired via addEventListener. Required by Manifest V3's strict Content Security Policy.
  • Double injection guard: window.__jobshieldInjected prevents the content script from re-registering listeners if injected multiple times.
  • Graceful injection errors: ensureContentScript() wraps injection in try/catch. The popup always renders even if the page disallows injection.
  • JWT persistence: Stored in chrome.storage.local — survives browser restarts.
  • Timeout safety: All chrome.tabs.sendMessage calls are protected against missing content script with proper error handling.

Loading the Extension Locally

  1. Open Chrome → chrome://extensions
  2. Enable Developer mode (top-right toggle)
  3. Click Load unpacked → select the extension/ folder
  4. JobShield icon appears in the toolbar

After any code change: click the refresh icon on the extension card in chrome://extensions.


Admin Panel & The Training Loop

The admin panel is not just a moderation dashboard — it is the mechanism that trains the entire scoring system.

Admin Capabilities

Action Effect on system
View all reports Filter by status, classification, date, sort by any field
Mark verified-scam user.verifiedReports++ · reputationScore +10 · cluster.verified = true
Mark rejected user.rejectedReports++ · reputationScore -5
Verify a cluster cluster.verified = true — activates Signal 1 (+35) for ALL future similar messages
Unverify a cluster Deactivates Signal 1 for that cluster
Force retrain POST /api/admin/retrain — immediately reruns logistic regression

The Flywheel

User submits suspicious message
          │
          ▼
Pipeline runs → report saved
High-risk reports auto-flagged (score > 70) → enter admin queue
          │
          ▼
Admin reviews → marks verified-scam or rejected
          │
          ├── Cluster verified
          │     → Signal 1 activates for all future similar messages
          │     → Protects every future user who sends a similar scam
          │
          └── 20+ verified + 20+ rejected reports reached
                → Logistic regression retrains (every 6h or on demand)
                → Weights shift to reflect real data
                → Scoring improves for all future analyses
                → More reports correctly classified
                → Admin efficiency improves
                → Loop continues

Every admin review action makes the entire system more accurate for every future user.

Reputation System

Users accumulate a reputationScore based on report quality:

  • +10 for each report admin-confirms as a real scam
  • -5 for each report admin-rejects as not a scam

High-reputation users are more trustworthy community contributors. This surfaces over time in the admin panel.

Database Collections

users

{
  email:            String,   // unique, indexed
  password:         String,   // bcrypt hashed, never returned in responses
  role:             String,   // "user" | "admin"
  reputationScore:  Number,   // starts 0, +10 verified / -5 rejected
  totalReports:     Number,
  verifiedReports:  Number,
  rejectedReports:  Number,
  createdAt:        Date
}

reports

{
  userId:            ObjectId,   // ref: User, indexed
  rawText:           String,
  textHash:          String,     // SHA-256, indexed
  embedding:         [Number],   // 384-dim (excluded from list endpoints)
  structuredSignals: {
    paymentLanguage:    Boolean,
    domainMismatch:     Boolean,
    bigBrandMentioned:  Boolean,
    suspiciousTLD:      Boolean,
    freeEmailProvider:  Boolean,
    telegramPresent:    Boolean,
    previouslyReported: Boolean,
    urgencyScore:       Number,  // used as continuous feature in weight learning
  },
  domain:            String,
  domainAgeDays:     Number,
  registrar:         String,
  similarityScore:   Number,
  riskScore:         Number,
  classification:    String,     // enum: Low Risk | Suspicious | High Risk | Likely Scam
  explanation:       [String],
  clusterId:         ObjectId,   // ref: ScamCluster
  status:            String,     // pending | auto-flagged | verified-scam | rejected
  location:          String,
  paymentMethod:     String,
  createdAt:         Date
}

scamclusters

{
  clusterEmbedding:   [Number],  // 384-dim centroid — running average of merged embeddings
  representativeText: String,    // most recent merged text (truncated to 500 chars)
  reportCount:        Number,    // total reports merged or attached
  averageRiskScore:   Number,
  verified:           Boolean,   // true activates Signal 1 (+35) for future matches
  dominantDomain:     String,
  dominantBrand:      String,
  firstReportedAt:    Date,
  lastReportedAt:     Date
}

embeddingcaches

{
  textHash:  String,    // SHA-256, unique
  embedding: [Number],  // 384-dim
  createdAt: Date       // TTL index: auto-deleted after 7 days
}

domainintelligences

{
  domain:    String,  // unique
  ageDays:   Number,
  registrar: String,
  flagCount: Number,  // increments on each report referencing this domain
  createdAt: Date
}

API Reference

Authentication

Method Endpoint Auth Body Response
POST /api/auth/register None { email, password } { token, user }
POST /api/auth/login None { email, password } { token, user }
GET /api/auth/me JWT { user }

Analysis

Method Endpoint Auth Body Response
POST /api/check JWT { text } { riskScore, classification, explanation[], signals{} }
POST /api/reports JWT { text, location?, paymentMethod? } { reportId, riskScore, classification, explanation[], status }

Reports

Method Endpoint Auth Query Params Description
GET /api/reports JWT ?page=1&limit=10 Own report history (paginated)
GET /api/reports/:id JWT Single report detail

Dashboard

Method Endpoint Auth Description
GET /api/dashboard/stats JWT Personal stats, recent reports, classification breakdown

Admin

Method Endpoint Auth Body / Query Description
GET /api/admin/reports JWT + Admin ?status=&classification=&page=&sortBy=&order= All reports with filters
GET /api/admin/reports/:id JWT + Admin Single report detail
PATCH /api/admin/reports/:id JWT + Admin { status } Update status — triggers user reputation adjustment
GET /api/admin/clusters JWT + Admin ?verified=true&page= All clusters
PATCH /api/admin/clusters/:id JWT + Admin { verified } Verify/unverify cluster
POST /api/admin/retrain JWT + Admin Force immediate logistic regression retraining

Example Response — POST /api/check

{
  "riskScore": 85,
  "classification": "Likely Scam",
  "explanation": [
    "Matches a verified scam template (99.4% similarity)",
    "Message requests payment, deposit, or fee upfront",
    "Sender domain doesn't match the organisation named in the message",
    "This pattern has been reported 7 times by other users",
    "Multiple scam signals detected together — elevated risk"
  ],
  "signals": {
    "confirmedScamMatch":  true,
    "highSimilarityMatch": false,
    "similarityScore":     0.994,
    "paymentLanguage":     false,
    "domainMismatch":      true,
    "domainAgeDays":       2573,
    "bigBrandMentioned":   false,
    "suspiciousTLD":       false,
    "freeEmailProvider":   false,
    "telegramPresent":     false,
    "previouslyReported":  true,
    "urgencyDetected":     false,
    "urgencyScore":        0
  },
  "cached": true
}

Environment Variables

server/.env

# MongoDB Atlas
MONGODB_URI=mongodb+srv://<user>:<password>@<cluster>.mongodb.net/<dbname>

# JWT
JWT_SECRET=your_strong_secret_here
JWT_EXPIRES_IN=7d

# HuggingFace — free account at huggingface.co → Settings → Access Tokens
HUGGINGFACE_API_KEY=hf_xxxxxxxxxxxxxxxxxxxx
HF_NER_MODEL=dslim/bert-base-NER

# WHOIS — free account at whoisxmlapi.com (500 queries/month free)
WHOIS_API_KEY=your_whois_api_key

# Similarity thresholds (these are the defaults — omit to use defaults)
SIMILARITY_CONFIRMED_THRESHOLD=0.95
SIMILARITY_HIGH_THRESHOLD=0.85

# Server
PORT=5000
CLIENT_URL=http://localhost:5173

extension/config.js

const JOBSHIELD_CONFIG = {
  API_BASE_URL: 'http://localhost:5000/api',   // → your deployed API URL for production
  APP_URL:      'http://localhost:5173',        // → your deployed app URL for production
}

Installation & Setup

Prerequisites

  • Node.js 18+
  • A MongoDB Atlas account — free M0 tier is sufficient
  • A HuggingFace account — free, for the Inference API key
  • A WhoisXML API account — free tier gives 500 queries/month

1. Clone the repository

git clone https://github.com/yourusername/jobshield.git
cd jobshield

2. Install dependencies

cd server && npm install
cd ../client && npm install

3. Configure environment variables

cd server
cp .env.example .env
# Edit .env and fill in all required values

4. Create Atlas Vector Search indexes

See Atlas Vector Search Setup below — required before signals 1 and 2 work.

5. Seed the database (recommended)

See Seeding the Database below — populates 200 verified scam clusters so similarity signals work from day one.

6. Start both servers

# Terminal 1 — backend
cd server && npm run dev

# Terminal 2 — frontend
cd client && npm run dev

Atlas Vector Search Setup

JobShield requires one mandatory vector search index (and one optional). These must be created manually in the Atlas UI — they cannot be created programmatically on the M0 free tier.

Required — ScamClusters index

  1. Go to cloud.mongodb.com → your cluster → Atlas Search tab
  2. Click Create Search Index → select Atlas Vector Search (not Atlas Full Text Search)
  3. Select your database → scamclusters collection
  4. Replace the default JSON with:
{
  "fields": [
    {
      "type": "vector",
      "path": "clusterEmbedding",
      "numDimensions": 384,
      "similarity": "cosine"
    }
  ]
}
  1. Name the index exactly: cluster_vector_index
  2. Click Create Search Index → wait for status Active (1–2 minutes)

Optional — Reports index

Same process on the reports collection, field embedding, name report_vector_index.

Verify it's working

Submit the same scam message twice. Your server terminal should show:

Vector search returned 5 results
   Best match: 69a28b..., score: 0.9987, verified: false
   Signal 2 fired (+20)
Merged into cluster 69a28b... (similarity: 0.999, count: 2)

Seeding the Database

Without seed data, signals 1 and 2 will never fire because ScamClusters is empty. The seed script pre-populates it with 200 verified scam clusters from Kaggle's SMS Spam Collection dataset.

Step 1 — Download the dataset from Kaggle: https://www.kaggle.com/datasets/shivamb/real-or-fake-fake-jobposting-prediction

Step 2 — Place the CSV at:

server/scripts/fake_job_posting.csv

Step 3 — Run the seed script:

cd server
npm run seed

The script embeds the 860 fake jobs via HuggingFace and inserts them as verified: true clusters. Takes 5–10 minutes due to HF API rate limiting (200ms delay between requests). Progress is printed to the terminal.


Running the Project

# Backend (http://localhost:5000)
cd server
npm run dev      # nodemon hot-reload
npm start        # no hot-reload

# Frontend (http://localhost:5173)
cd client
npm run dev      # Vite dev server
npm run build    # production build
npm run preview  # preview production build locally

# Database seeding
cd server
npm run seed     # import Kaggle spam dataset into ScamClusters

Creating an Admin Account

All accounts register as role: "user" by default. To promote an account to admin:

MongoDB Atlas UI (Collections → users) or mongosh:

db.users.updateOne(
  { email: "your@email.com" },
  { $set: { role: "admin" } }
)

Log out and log back in for the role change to apply. Admin users unlock the /admin/reports and /admin/clusters pages in the frontend, and the admin API endpoints.

The most impactful admin action: Verifying a cluster. Setting verified: true on a cluster activates Signal 1 (+35 points, minimum score 72) for every future message that semantically matches it. A few minutes of admin review in the early days can protect thousands of future users.

Built with Node.js · React · MongoDB Atlas · HuggingFace Inference API · Chrome Extensions Manifest V3

About

An adaptive multisignal job fraud detection system that detects fake job and internship scams by analyzing message text using 11 fraud signals, semantic embeddings, and a self-organizing scam cluster database. It returns a 0–100 risk score with plain-English explanations, and a Chrome extension brings detection directly into browser.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors