vLLM Manager

Multi-Instance vLLM Cluster Management & Log Aggregation

English | 中文

📖 About

vLLM Manager provides multi-instance vLLM cluster management, automatic log collection, and load balancing.

Start vLLM: Uses official CLI (python -m vllm.entrypoints.openai.api_server)
Send Requests: Uses official OpenAI SDK (from openai import OpenAI)
Cluster Management: Auto start/stop, health checks, failover
Log Collection: All instance logs automatically saved to files

✨ Features

🎯 Multi-Instance Management: Start/stop multiple vLLM instances with one command
📝 Automatic Logging: Log files named by model and port for easy identification
🔄 Failover: Auto-retry on other instances when request fails
❤️ Health Monitoring: Continuous instance health checks
🔧 OpenAI SDK: Returns standard OpenAI client, seamless integration
⚖️ Load Balancing: Round-robin request distribution

🛠️ Tech Stack

Python 3.8+
vLLM - LLM inference engine
OpenAI SDK - API client
Requests - HTTP client

📦 Installation

# 1. Install vLLM
pip install vllm

# 2. Install dependencies
pip install -r requirements.txt

# Or install individually
pip install openai requests

🚀 Quick Start

Basic Usage

from vllm_manager import VLLMCluster, VLLMInstance

# 1. Create cluster
cluster = VLLMCluster(log_dir="./vllm_logs")

# 2. Add instances
cluster.add_instance(VLLMInstance(
    name="server1",
    model="facebook/opt-125m",
    port=8000,
    gpu_memory_utilization=0.5,
))

# 3. Start all instances
cluster.start_all()

# 4. Get OpenAI client
client = cluster.get_openai_client()

# 5. Send requests (auto load-balanced)
response = client.completions.create(
    model="facebook/opt-125m",
    prompt="San Francisco is a",
)
print(response)

# 6. Stop cluster
cluster.stop_all()

Multi-Model Example

from vllm_manager import VLLMCluster, VLLMInstance

cluster = VLLMCluster()

# Add instances with different models
cluster.add_instance(VLLMInstance(
    name="qwen-server",
    model="Qwen/Qwen2.5-1.5B-Instruct",
    port=8000,
))

cluster.add_instance(VLLMInstance(
    name="llama-server",
    model="meta-llama/Llama-2-7b-chat",
    port=8001,
))

cluster.start_all()

# View model name for each instance
for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")
# qwen-server: Qwen2.5-1.5B-Instruct
# llama-server: Llama-2-7b-chat

# Log files automatically include model name
# vllm_Qwen2.5-1.5B-Instruct_8000_20260227_101234.log
# vllm_Llama-2-7b-chat_8001_20260227_101235.log

📖 API Reference

VLLMInstance

VLLMInstance(
    name: str,                    # Instance name
    model: str,                   # Model name/path
    port: int = 8000,             # Port
    host: str = "0.0.0.0",        # Host
    log_dir: Optional[Path] = None,
    
    # vLLM parameters (inherited from AsyncEngineArgs)
    gpu_memory_utilization: float = 0.9,
    tensor_parallel_size: int = 1,
    pipeline_parallel_size: int = 1,
    max_model_len: Optional[int] = None,
    quantization: Optional[str] = None,
    dtype: str = "auto",
    # ... supports all AsyncEngineArgs parameters
)

# Properties
instance.served_model_name  # Model name (last path component)
instance.base_url           # http://host:port
instance.api_url            # http://host:port/v1
instance.log_file           # Log file path

VLLMCluster

cluster = VLLMCluster(log_dir="./vllm_logs")
cluster.add_instance(instance: VLLMInstance)
cluster.start_all()
cluster.stop_all()
cluster.health_check()
client = cluster.get_openai_client()

📝 Log Management

Log File Naming

Log files are named by model name + port + timestamp for easy identification:

./vllm_logs/
├── vllm_manager_20260227_101234.log          # Manager logs
├── vllm_Qwen2.5-1.5B-Instruct_8000_101235.log  # Qwen model
└── vllm_Llama-2-7b-chat_8001_101236.log        # Llama model

View Logs

from vllm_manager import LogAggregator

aggregator = LogAggregator(log_dir="./vllm_logs")

# Get all logs
logs = aggregator.get_all_logs(limit=100)
for log in logs:
    print(f"[{log.timestamp}] {log.instance}: {log.message}")

# Export to JSON
aggregator.export_json("logs.json")

❓ FAQ

Q: Why use vLLM Manager?

A: When you need to run multiple vLLM instances (different models, different GPUs), vLLM Manager provides unified cluster management and log collection.

Q: Which vLLM parameters are supported?

A: All AsyncEngineArgs parameters are supported, since VLLMInstance inherits from AsyncEngineArgs.

Q: How are log files named?

A: Format is vllm_{model_name}_{port}_{timestamp}.log, where model_name is the last component of the model path (e.g., Qwen2.5-1.5B-Instruct).

Q: How do I check which model each instance is running?

A: Use the instance.served_model_name property:

for instance in cluster.instances.values():
    print(f"{instance.name}: {instance.served_model_name}")

🤝 Contributing

Issues and Pull Requests are welcome!

# 1. Fork the repo
# 2. Create your branch (git checkout -b feature/AmazingFeature)
# 3. Commit your changes (git commit -m 'Add some AmazingFeature')
# 4. Push to the branch (git push origin feature/AmazingFeature)
# 5. Open a Pull Request

📄 License

MIT License - See LICENSE file for details.

📬 Contact

Project URL: https://github.com/AiKiAi-stack/vllm_startup
Issues: https://github.com/AiKiAi-stack/vllm_startup/issues
Author: AiKiAi-stack

🙏 Acknowledgements

vLLM - LLM inference engine
OpenAI Python SDK - API client

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
src/vllm_manager		src/vllm_manager
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.zh.md		README.zh.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM Manager

📖 About

✨ Features

🛠️ Tech Stack

📦 Installation

🚀 Quick Start

Basic Usage

Multi-Model Example

📖 API Reference

VLLMInstance

VLLMCluster

📝 Log Management

Log File Naming

View Logs

❓ FAQ

Q: Why use vLLM Manager?

Q: Which vLLM parameters are supported?

Q: How are log files named?

Q: How do I check which model each instance is running?

🤝 Contributing

📄 License

📬 Contact

🙏 Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vLLM Manager

📖 About

✨ Features

🛠️ Tech Stack

📦 Installation

🚀 Quick Start

Basic Usage

Multi-Model Example

📖 API Reference

VLLMInstance

VLLMCluster

📝 Log Management

Log File Naming

View Logs

❓ FAQ

Q: Why use vLLM Manager?

Q: Which vLLM parameters are supported?

Q: How are log files named?

Q: How do I check which model each instance is running?

🤝 Contributing

📄 License

📬 Contact

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages