DataFlow Studio

A no-code ETL (Extract, Transform, Load) framework with visual pipeline designer that simulates the capabilities of enterprise tools like PySpark, Apache Airflow, and Apache Flink.

🚀 Features

Visual Pipeline Builder

Drag & Drop Interface: Intuitive visual pipeline designer
Component Toolbox: Pre-built data sources, transformations, and destinations
Real-time Canvas: Interactive pipeline visualization with connections
YAML Configuration: Advanced configuration through YAML editor

Data Connectors

Oracle Database: Enterprise database connectivity
MongoDB: NoSQL document database support
Apache Hive: Big data warehouse integration
PostgreSQL & MySQL: Relational database support
Connection Testing: Built-in connectivity validation

Job Monitoring & Execution

Real-time Monitoring: Live job status and progress tracking
Execution Logs: Detailed logging and error reporting
Performance Metrics: Job duration and success rate analytics
Status Management: Queue, run, pause, and stop operations

Pipeline Scheduler

Cron-based Scheduling: Flexible time-based automation
DAG Visualization: Directed Acyclic Graph representation
Schedule Management: Enable, disable, and modify schedules
Dependency Tracking: Task dependency management

Dashboard & Analytics

System Overview: Key metrics and statistics
Active Jobs: Real-time pipeline execution status
Data Processing: Volume and performance tracking
Historical Reports: Success rates and trends

🛠️ Technology Stack

Frontend: React 18, TypeScript, Tailwind CSS
Backend: Node.js, Express, TypeScript
Database: In-memory storage (extensible to PostgreSQL)
Build Tools: Vite, ESBuild
UI Components: Radix UI, Shadcn/ui
State Management: TanStack Query

📦 Installation

Prerequisites

Node.js 20.x or higher
npm or yarn package manager

Quick Start

Clone the repository

git clone https://github.com/your-username/dataflow-studio.git
cd dataflow-studio

Install dependencies
```
npm install
```
Start the development server
```
npm run dev
```
Access the application Open your browser to http://localhost:5000

🏗️ Project Structure

dataflow-studio/
├── client/                 # React frontend application
│   ├── src/
│   │   ├── components/     # Reusable UI components
│   │   │   ├── connectors/ # Data connector components
│   │   │   ├── jobs/       # Job monitoring components
│   │   │   ├── layout/     # Layout components
│   │   │   ├── pipeline/   # Pipeline builder components
│   │   │   ├── scheduler/  # Scheduler components
│   │   │   └── ui/         # Base UI components
│   │   ├── hooks/          # Custom React hooks
│   │   ├── lib/            # Utility libraries
│   │   └── pages/          # Application pages
├── server/                 # Node.js backend
│   ├── index.ts           # Server entry point
│   ├── routes.ts          # API routes
│   ├── storage.ts         # Data storage layer
│   └── vite.ts            # Vite integration
├── shared/                # Shared types and schemas
│   └── schema.ts          # Database schema definitions
└── docs/                  # Documentation

🎯 Usage Guide

Creating Your First Pipeline

Navigate to Pipeline Builder
- Click on "Pipeline Builder" in the sidebar
- Start with a blank canvas
Add Data Sources
- Drag Oracle, MongoDB, or Hive components from the toolbox
- Configure connection parameters
Add Transformations
- Add Filter, Join, or Aggregate components
- Define transformation logic in YAML
Add Destinations
- Configure data warehouse or file export targets
- Set up output parameters
Save and Execute
- Save your pipeline configuration
- Click "Run" to execute immediately or schedule for later

Setting Up Data Connectors

Go to Connectors Page
- Click "Add Connector" button
- Select your database type
Configure Connection
- Enter host, port, database credentials
- Test the connection
Use in Pipelines
- Reference connectors in your pipeline configurations
- Data sources automatically use configured connections

Monitoring Jobs

Job Monitor Dashboard
- View all running and completed jobs
- Monitor progress and performance
Real-time Updates
- Jobs refresh automatically every 5 seconds
- View detailed logs and error messages

Scheduling Pipelines

Create Schedule
- Select a pipeline to schedule
- Choose from predefined cron patterns
Manage Schedules
- Enable/disable schedules
- View next run times and history

🔧 Configuration

Environment Variables

Create a .env file in the root directory:

NODE_ENV=development
PORT=5000
DATABASE_URL=your_database_connection_string

YAML Pipeline Configuration

Example pipeline configuration:

transformations:
  - name: "customer_cleansing"
    type: "data_quality"
    rules:
      - field: "email"
        validation: "email_format"
      - field: "phone"
        standardize: "e164_format"

sources:
  oracle_orders:
    connection: "prod_oracle"
    query: "SELECT * FROM customers WHERE created_date >= '2024-01-01'"
    
targets:
  hive_warehouse:
    table: "analytics.customers_clean"
    mode: "append"

🤝 Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow TypeScript best practices
Write comprehensive tests
Update documentation for new features
Follow the existing code style

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🎖️ Acknowledgments

Inspired by enterprise ETL tools like Talend and AbInitio
Built with modern web technologies
Designed for ease of use and scalability

📞 Support

Create an issue for bug reports
Start a discussion for feature requests
Check the documentation for common questions

DataFlow Studio - Making ETL accessible to everyone, from data engineers to business analysts.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
client		client
server		server
shared		shared
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
LICENSE		LICENSE
README.md		README.md
components.json		components.json
drizzle.config.ts		drizzle.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataFlow Studio

🚀 Features

Visual Pipeline Builder

Data Connectors

Job Monitoring & Execution

Pipeline Scheduler

Dashboard & Analytics

🛠️ Technology Stack

📦 Installation

Prerequisites

Quick Start

🏗️ Project Structure

🎯 Usage Guide

Creating Your First Pipeline

Setting Up Data Connectors

Monitoring Jobs

Scheduling Pipelines

🔧 Configuration

Environment Variables

YAML Pipeline Configuration

🤝 Contributing

Development Guidelines

📄 License

🎖️ Acknowledgments

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataFlow Studio

🚀 Features

Visual Pipeline Builder

Data Connectors

Job Monitoring & Execution

Pipeline Scheduler

Dashboard & Analytics

🛠️ Technology Stack

📦 Installation

Prerequisites

Quick Start

🏗️ Project Structure

🎯 Usage Guide

Creating Your First Pipeline

Setting Up Data Connectors

Monitoring Jobs

Scheduling Pipelines

🔧 Configuration

Environment Variables

YAML Pipeline Configuration

🤝 Contributing

Development Guidelines

📄 License

🎖️ Acknowledgments

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages