Skip to content

orassayag/snowflake

Repository files navigation

Snowflake Metadata Sync Service

A Node.js/Express REST API service that synchronizes table metadata from Snowflake data warehouse into MongoDB, providing a cached and queryable metadata layer with pagination support.

Built in 2023-2026. This TypeScript application connects to Snowflake, extracts table and column metadata, stores it in MongoDB, and exposes REST API endpoints with Swagger documentation.

Features

  • 🔄 Snowflake Integration - Connects to Snowflake data warehouse and extracts metadata
  • 📊 Table Metadata Sync - Fetches table names, schemas, columns, and statistics
  • 🗄️ MongoDB Caching - Stores metadata in MongoDB for fast querying
  • 📄 Pagination Support - Retrieve metadata with configurable page sizes
  • 📚 Swagger UI - Interactive API documentation at /api-docs
  • Input Validation - Joi-based request validation
  • 🔐 Error Handling - Centralized error handling with custom error classes
  • 📝 Request Logging - Winston-based logging for all requests
  • 🚀 TypeScript - Full type safety with strict TypeScript configuration

Architecture

graph LR
    A[Client] -->|HTTP Request| B[Express API]
    B -->|Sync Endpoint| C[Snowflake Service]
    C -->|Query Metadata| D[Snowflake DW]
    C -->|Store Results| E[MongoDB]
    B -->|Query Endpoint| E
    E -->|Return Data| B
    B -->|JSON Response| A
    
    F[Swagger UI] -.->|Documentation| B
    G[Winston Logger] -.->|Logs| B
    H[Joi Validation] -.->|Validate| B
Loading

Getting Started

Prerequisites

  • Node.js (v16 or higher)
  • npm or yarn
  • MongoDB (v4.4 or higher)
  • Active Snowflake account with appropriate permissions

Installation

  1. Clone the repository:
git clone https://github.com/orassayag/snowflake-metadata-sync.git
cd snowflake-metadata-sync
  1. Install dependencies:
npm install
  1. Configure your environment:

    • Update MongoDB connection in src/config/constants.config.ts
    • Configure Snowflake credentials (see Security section below)
  2. Start MongoDB (if running locally):

mongod
  1. Run the application:
# Development mode with hot reload
npm run dev

# Production build and start
npm run start

The server will start on http://localhost:8080

Configuration

⚠️ SECURITY WARNING: Before using in production, move all credentials to environment variables.

Edit src/config/constants.config.ts:

const CONSTANTS = {
  SERVER: {
    PORT: '8080',
    EXPRESS_LIMIT: '5mb',
  },
  DATA: {
    MONGODB: {
      CONNECTION_URL: 'mongodb://127.0.0.1:27017/test',
    },
    SNOWFLAKE: {
      ACCESS_URL: 'https://your-account.snowflakecomputing.com',
      USERNAME: 'your-username',
      PASSWORD: 'your-password',
      ACCOUNT: 'YOUR_ACCOUNT_ID',
      DATABASE: 'YOUR_DATABASE',
      ROLE: 'YOUR_ROLE',
      TABLE: 'YOUR_SCHEMA',
    },
  },
};

API Documentation

Interactive Documentation

Access Swagger UI at: http://localhost:8080/api-docs

Endpoints

1. Sync Metadata from Snowflake

POST /api/metadata/integration/sync

Fetches all table metadata from Snowflake and stores in MongoDB.

Response:

{
  "status": "success"
}

What it does:

  • Clears existing metadata in MongoDB
  • Queries Snowflake for all tables in the configured schema
  • Fetches detailed column information for each table
  • Stores complete metadata with relationships

2. Get Tables with Pagination

GET /api/metadata/tables?page=1&limit=10

Query Parameters:

  • page (required): Page number, starting from 1
  • limit (required): Number of records per page

Response:

{
  "data": [
    {
      "_id": "6467f0bfecd9682fd2ddbc36",
      "created_on": "2021-11-10T05:04:44.572Z",
      "name": "CALL_CENTER",
      "database_name": "SNOWFLAKE_SAMPLE_DATA",
      "schema_name": "TPCDS_SF100TCL",
      "kind": "TABLE",
      "cluster_by": "LINEAR( cc_call_center_sk )",
      "rows": 60,
      "bytes": 18944,
      "retention_time": 1,
      "automatic_clustering": "OFF",
      "change_tracking": "OFF",
      "is_external": "N",
      "columns": [
        {
          "name": "CC_STATE",
          "type": "TEXT"
        },
        {
          "name": "CC_CALL_CENTER_SK",
          "type": "NUMBER"
        }
      ]
    }
  ],
  "page": 1,
  "limit": 10,
  "totalRecords": 24
}

Project Structure

snowflake-metadata-sync/
├── src/
│   ├── app.ts                    # Express application entry point
│   ├── config/
│   │   └── constants.config.ts   # Application constants
│   ├── controllers/
│   │   └── metadata.controller.ts # Request handlers
│   ├── custom/
│   │   └── error.custom.ts       # Custom error class
│   ├── helpers/
│   │   └── express.helper.ts     # Express utilities
│   ├── middlewares/
│   │   ├── errors.middleware.ts  # Error handling
│   │   ├── joi.middleware.ts     # Input validation
│   │   └── logs.middleware.ts    # Request logging
│   ├── models/
│   │   └── metadata.model.ts     # Business logic
│   ├── routes/
│   │   ├── index.public.routes.ts
│   │   └── public/
│   │       └── metadata.routes.ts # API routes
│   ├── schemas/
│   │   └── metadata.schema.ts    # Mongoose schema
│   ├── services/
│   │   ├── logger.service.ts     # Winston logger
│   │   ├── mongodb.service.ts    # MongoDB connection
│   │   ├── snowflake.service.ts  # Snowflake integration
│   │   └── swagger.service.ts    # Swagger setup
│   ├── utils/
│   │   ├── apps.utils.ts
│   │   └── files.utils.ts
│   └── validations/
│       └── tables.validation.ts   # Joi schemas
├── config/
│   ├── env.js                     # Environment loader
│   └── env.json                   # Environment config
├── dist/                          # Compiled JavaScript
├── .eslintrc.json                 # ESLint configuration
├── tsconfig.json                  # TypeScript config
├── tsconfig.build.json            # Build config
└── package.json

Data Flow Diagram

sequenceDiagram
    participant Client
    participant API
    participant Snowflake
    participant MongoDB

    Note over Client,MongoDB: Integration Sync Flow
    Client->>API: POST /api/metadata/integration/sync
    API->>MongoDB: Clear existing metadata
    API->>Snowflake: SHOW TABLES IN schema
    Snowflake-->>API: Tables list
    API->>Snowflake: SELECT columns info
    Snowflake-->>API: Columns for each table
    API->>MongoDB: Insert enriched metadata
    MongoDB-->>API: Confirmation
    API-->>Client: {status: "success"}

    Note over Client,MongoDB: Query Tables Flow
    Client->>API: GET /api/metadata/tables?page=1&limit=10
    API->>MongoDB: Find with pagination
    MongoDB-->>API: Results + count
    API-->>Client: {data, page, limit, totalRecords}
Loading

Available Scripts

npm run dev

Runs the app in development mode with hot reload via nodemon.

npm run start

Compiles TypeScript and runs the production build.

npm run lint

Checks code for linting errors using ESLint.

npm run prettier-check

Checks code formatting with Prettier.

npm run prettier-fix

Formats code using Prettier.

npm run kill

Kills all Node.js processes (Windows only).

Development

The project uses:

  • TypeScript - Type-safe JavaScript
  • Express - Web framework
  • Mongoose - MongoDB ODM
  • Snowflake SDK - Snowflake connector
  • Winston - Logging
  • Swagger - API documentation
  • Joi - Input validation
  • ESLint - Code linting
  • Prettier - Code formatting
  • Nodemon - Development hot reload

Code Quality

# Run linter
npm run lint

# Check formatting
npm run prettier-check

# Auto-fix formatting
npm run prettier-fix

TypeScript Configuration

The project uses two TypeScript configurations:

  • tsconfig.json - Development configuration
  • tsconfig.build.json - Production build configuration

Security Considerations

⚠️ CRITICAL: This repository contains example credentials in src/config/constants.config.ts.

Before deploying or committing:

  1. Never commit real credentials - Move all sensitive data to environment variables
  2. Use .env files - Store credentials in .env (add to .gitignore)
  3. Implement authentication - Add JWT or API key authentication to endpoints
  4. Enable CORS properly - Replace origin: '*' with specific allowed origins
  5. Use HTTPS - In production, always use HTTPS
  6. Rate limiting - Add rate limiting middleware
  7. Input validation - Already implemented with Joi
  8. MongoDB authentication - Use authenticated MongoDB connections
  9. Snowflake permissions - Use least-privilege roles
  10. Audit logging - Log all data access for compliance

Error Handling

The application uses centralized error handling:

  • CustomError class for application errors
  • Error middleware catches and formats all errors
  • Consistent JSON error responses
  • Winston logging for error tracking

Contributing

Contributions to this project are released to the public under the project's open source license.

Everyone is welcome to contribute. Contributing doesn't just mean submitting pull requests—there are many different ways to get involved, including answering questions and reporting issues.

See CONTRIBUTING.md for detailed guidelines.

Author

License

This application is licensed under the MIT License - see the LICENSE file for details.

About

A Node.js/Express REST API service that synchronizes table metadata from Snowflake data warehouse into MongoDB, providing a cached and queryable metadata layer with pagination. Built in 2023-2026. This TypeScript application connects to Snowflake, extracts table and column metadata, stores it in MongoDB, exposes REST API endpoints and Swagger.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors