llm-serve

layout	llm-serve
title	llm-serve
keywords	overview,introduction

llm-serve

A simple demonstration setting up minimal infra to serve an LLM(Gemma 7B) on user owned infrastructure using Google Cloud and Ray primitives.

How to run

In the most basic scenario, you can use:

gcloud auth login
gcloud config set project llm-serve-112
gcloud auth application-default login

gcloud container clusters get-credentials testing-serving-gcp-us-central1 --zone us-central1

terraform init
terraform plan
terraform apply

./tools/port-forward.py   # port forward all services (must be created and online)

Helpful Debbugging Commands

k get raycluster -n ray
k describe raycluster -n ray

# get into head node:
export HEAD_POD=$(kubectl get pods --selector=ray.io/node-type=head -n ray -o custom-columns=POD:metadata.name --no-headers)

k describe -n ray pod $HEAD_POD
k exec -n ray -it $HEAD_POD -- ray status
k exec -n ray -it $HEAD_POD -- ray list actors

k exec -n ray -it $HEAD_POD /bin/bash
# => serve run model:entrypoint

# show rayservice
k describe rayservice google-recurrentgemma-2b-it-model-server -n ray

# latest serve logs
k exec -it $HEAD_POD -n ray -- ls -lah /tmp/ray/session_latest/logs/serve

# Get status of serving container:
kubectl exec -n ray -it $HEAD_POD -- serve status

# show model in ray namespace
k describe cm google-recurrentgemma-2b-it-model-server -n ray

Test the Inference/Serving API

curl -X POST http://localhost:8000 \
  -H "Content-Type: application/json" \
  -d '{"text": "Im going to say hello, you should return: (one word only)"}'

Google Cloud Resources

Resource	Google Name/Location
Project Home	`llm-serve-423918`

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
terraform		terraform
theme		theme
tools		tools
.env		.env
.gitignore		.gitignore
.gitmodules		.gitmodules
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-serve

How to run

Helpful Debbugging Commands

Test the Inference/Serving API

Google Cloud Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-serve

How to run

Helpful Debbugging Commands

Test the Inference/Serving API

Google Cloud Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages