Skip to content

CUD2V/llmops_devstack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Operations Development Stack

A secure, localhost-only infrastructure for LLM development and experimentation, featuring automated setup and management of MLflow for experiment tracking and Forgejo for version control and CI/CD. Includes convenient shell integration with enhanced prompts showing git branches, Python environments, and job status.

🎯 Purpose

This repository provides scripts and configuration for managing a local LLM development environment with:

  • MLflow: Experiment tracking, model versioning, and artifact management
  • Forgejo: Self-hosted Git service with CI/CD capabilities
  • Private & Secure: All services run locally with no external dependencies
  • Security First: Localhost-only access, secure configurations, and proper file permissions

🏗️ Architecture

The stack creates a self-contained development environment:

┌────────────────────────────────────────────────────────┐
│                LLM DevStack                            │
├────────────────────────────────────────────────────────┤
│  Forgejo (localhost:3000)     MLflow (localhost:5000)  │
│  ├─ Git repositories          ├─ Experiment tracking   │
│  ├─ CI/CD pipelines           ├─ Model registry        │
│  └─ Issue tracking            └─ Artifact storage      │
├────────────────────────────────────────────────────────┤
│              Local File System                         │
│  ├─ SQLite databases                                   │
│  ├─ Git repositories                                   │
│  ├─ ML artifacts                                       │
│  └─ Configuration files                                │
└────────────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

  • macOS or Linux
  • Python 3.7+ for MLflow virtual environment
  • curl for downloading Forgejo binary
  • openssl for generating security keys
  • Homebrew (macOS only) - for Forgejo installation

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd llmops_devstack
  2. Configure installation paths:

    cp config.env.example config.env
    # Edit config.env with your preferred installation directories (optional)
  3. Run the setup script:

    ./scripts/setup.sh
  4. Configure shell (optional):

    ./scripts/configure_shell.sh
    source ~/.bashrc

    Adds convenient aliases and enhanced prompt - see Shell Integration for details

  5. Start the services:

    ./scripts/start_services.sh
  6. Access the services:

📋 Configuration

Default Configuration

The system uses sensible defaults that work out-of-the-box:

  • Installation Directory: $HOME/llmops_services
  • Network Binding: 127.0.0.1 (localhost only)
  • Default Ports: 3000 (Forgejo), 5000 (MLflow)
  • Auto Port Detection: Finds available ports if defaults are busy

Customization

Edit config.env to customize:

# Base installation directory
BASE_DIR="$HOME/llmops_services"

# Network security
FORGEJO_SERVER_HOST="127.0.0.1"    # Localhost only
MLFLOW_SERVER_HOST="127.0.0.1"     # Localhost only

# Service timeouts
GRACEFUL_SHUTDOWN_TIMEOUT=10        # Seconds
STATUS_CHECK_TIMEOUT=2              # Seconds

# Default ports (auto-detected if busy)
DEFAULT_FORGEJO_PORT=3000
DEFAULT_MLFLOW_PORT=5000

🔧 Usage

Service Management

# Start all services
./scripts/start_services.sh

# Check service status
./scripts/status_services.sh

# Stop all services
./scripts/stop_services.sh

Forgejo Setup

Initial Setup (first time only):

  1. Start services: ./scripts/start_services.sh
  2. Visit Forgejo web interface (URL will be displayed)
  3. Complete the installation wizard:
    • Database Type: Select SQLite3 (file-based, no server required)
    • Leave other database settings as default
    • Scroll down to the bottom of the configuration page
    • Administrator Account: Fill out username and password for your admin user
    • Click "Install Forgejo" to complete setup
  4. Registration is disabled by default for security

Environment Activation

# Activate the development environment
source $BASE_DIR/activate.sh

# Now you can use MLflow CLI directly
mlflow --help

📁 Directory Structure

$HOME/llmops_services/
├── forgejo/
│   ├── bin/forgejo                 # Forgejo binary
│   ├── data/gitea/                 # Database and repositories
│   └── logs/                       # Application logs
├── mlflow/
│   ├── tracking/                   # MLflow tracking database
│   ├── artifacts/                  # Model artifacts
│   ├── logs/                       # Application logs
│   └── mlflow.db                   # SQLite database
├── venv/                           # Python virtual environment
├── logs/                           # Service startup logs
└── activate.sh                     # Environment activation script

# Runtime files in project root:
├── .forgejo.pid                    # Process ID (for shutdown)
├── .forgejo.port                   # Port number (for access)
├── .mlflow.pid                     # Process ID (for shutdown)
└── .mlflow.port                    # Port number (for access)

🛠️ Advanced Usage

Shell Integration

Automated Setup (Recommended):

# Configure shell with aliases and enhanced prompt
./scripts/configure_shell.sh

# Apply changes
source ~/.bashrc

Manual Setup:

# Add aliases to ~/.bashrc or ~/.zshrc
alias llmops-start="/path/to/llmops_devstack/scripts/start_services.sh && source ~/llmops_services/.mlflow_env"
alias llmops-stop="/path/to/llmops_devstack/scripts/stop_services.sh"
alias llmops-status="/path/to/llmops_devstack/scripts/status_services.sh"

The configure_shell.sh script automatically:

  • Detects your shell (bash/zsh)
  • Creates backup of existing configuration
  • Adds LLM DevStack aliases with absolute paths
  • Configures enhanced prompt with git branch, Python environment, and SLURM job info
  • Safely updates existing configuration if run again

MLflow Integration

import mlflow

# MLflow will automatically use the local tracking server
mlflow.set_tracking_uri("http://localhost:5000")

# Track experiments
with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_artifact("model.pkl")

Forgejo Integration

# Clone repositories
git clone http://localhost:3000/username/repository.git

# Set up CI/CD in .forgejo/workflows/
# Push changes to trigger builds

🔍 Troubleshooting

Common Issues

Services won't start:

# Check if ports are available
netstat -an | grep :3000
netstat -an | grep :5000

# Check logs
tail -f $BASE_DIR/logs/forgejo.log
tail -f $BASE_DIR/logs/mlflow.log

Permission errors:

# Fix file permissions
chmod 700 $BASE_DIR/forgejo/data
chmod 700 $BASE_DIR/mlflow

Can't access services:

# Verify services are running
./scripts/status_services.sh

# Check network binding
ps aux | grep forgejo
ps aux | grep mlflow

Log Locations

  • Service startup logs: $BASE_DIR/logs/
  • Forgejo logs: $BASE_DIR/forgejo/logs/
  • MLflow logs: $BASE_DIR/mlflow/logs/

📄 License

MIT License - see LICENSE file for details.

Data Backup

Create Backup:

# Backup all data with timestamped archive
./scripts/backup_data.sh

# Backups are stored in $BASE_DIR/backups/YYYYMMDD_HHMMSS/
# Includes manifest file with detailed inventory

Restore from Backup:

# Stop services first
./scripts/stop_services.sh

# Copy data back from backup directory
cp -r $BASE_DIR/backups/20240101_120000/mlflow/* $BASE_DIR/mlflow/
cp -r $BASE_DIR/backups/20240101_120000/forgejo/* $BASE_DIR/forgejo/data/

# Restart services
./scripts/start_services.sh

Process Management

Clean Up Orphaned Processes:

# Interactive cleanup of MLflow processes
./scripts/cleanup_mlflow.sh

# Useful if services weren't stopped properly

📚 Examples

The examples/ directory contains sample scripts demonstrating integration with HPC environments and common ML workflows:

  • gpu_check.sh - SLURM job script for GPU availability checking with MLflow logging
  • See examples/README.md for detailed usage instructions

🔧 Script Reference

Script Purpose Usage
setup.sh Initial installation and configuration ./scripts/setup.sh
start_services.sh Start MLflow and Forgejo services ./scripts/start_services.sh
stop_services.sh Gracefully stop all services ./scripts/stop_services.sh
status_services.sh Check service status and URLs ./scripts/status_services.sh
configure_shell.sh Configure shell aliases and enhanced prompt ./scripts/configure_shell.sh
backup_data.sh Create timestamped backup of all data ./scripts/backup_data.sh
cleanup_mlflow.sh Clean up orphaned MLflow processes ./scripts/cleanup_mlflow.sh

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages