Skip to content

Latest commit

 

History

History
142 lines (129 loc) · 4.54 KB

File metadata and controls

142 lines (129 loc) · 4.54 KB

Experimental Setup

Hosted Model Configurations

Model GPU Type # GPUs Precision Deployment Inference Params
gpt-oss-20b NVIDIA A100 2 BF16 vLLM default vLLM
gpt-oss-120b NVIDIA H100 2 BF16 vLLM max-logprobs=10
qwen3.5-27b NVIDIA H100 2 BF16 vLLM max-model-len=32768, max-num-seqs=8
parakeet-ctc-1.1b NVIDIA H100 1
voxtral-mini-3b NVIDIA H100 1 BF16 vLLM max-logprobs=10, max-model-len=32768
whisper-large-v3 NVIDIA A100 1 FP16 vLLM max-logprobs=10, do_sample=false + defaults
magpie NVIDIA H100 1
kokoro Tesla V100-SXM2-32GB 1
chatterbox turbo NVIDIA A100 1 FP32 chatterbox-tts-api chunk_size=200, strategy=sentence, quality=balanced
ultravox-v0_7-glm-4_6 NVIDIA H100-80GB-HBM3 16 BF16 vLLM max-logprobs=10, max-model-len=16000

LLM Configuration

LLM setups in .env for evaluated models and judge models:

EVA_MODEL_LIST='[
  {
    "model_name": "gpt-5-mini",
    "litellm_params": {
      "model": "openai/gpt-5-mini",
      "api_key": "",
      "max_parallel_requests": 50,
      "reasoning_effort": "minimal"
    },
    "model_info": {"base_model": "gpt-5-mini"}
  },
  {
    "model_name": "gpt-oss-20b",
    "litellm_params": {
      "model": "openai/<vllm served model name>",
      "api_key": "",
      "api_base": "",
      "reasoning_effort": "low"
    }
  },
  {
    "model_name": "gpt-oss-120b",
    "litellm_params": {
      "model": "openai/<vllm served model name>",
      "api_key": "",
      "api_base": "",
      "reasoning_effort": "low"
    }
  },
  {
    "model_name": "qwen35-27B",
    "litellm_params": {
      "model": "openai/<vllm served model name>",
      "api_key": "",
      "api_base": "",
      "temperature": 1.0,
      "top_p": 0.95,
      "top_k": 20,
      "min_p": 0.0,
      "presence_penalty": 1.5,
      "repetition_penalty": 1.0,
      "extra_body": {
        "chat_template_kwargs": {
          "enable_thinking": false
        }
      }
    }
  },
  {
    "model_name": "sonnet-4-6",
    "litellm_params": {
      "model": "bedrock/us.anthropic.claude-sonnet-4-6",
      "aws_access_key_id": "",
      "aws_secret_access_key": "",
      "reasoning_effort": "low"
    }
  },
  {
    "model_name": "gpt-5.2",
    "litellm_params": {
      "model": "openai/gpt-5.2",
      "api_key": ""
    },
    "model_info": {"base_model": "gpt-5.2"}
  },
  {
    "model_name": "gemini-3.1-pro-preview",
    "litellm_params": {
      "model": "vertex_ai/gemini-3.1-pro-preview",
      "vertex_project": "",
      "vertex_location": "",
      "vertex_credentials": "os.environ/GOOGLE_APPLICATION_CREDENTIALS",
      "reasoning_effort": "low"
    }
  },
  {
    "model_name": "us.anthropic.claude-opus-4-6-v1",
    "litellm_params": {
      "model": "bedrock/us.anthropic.claude-opus-4-6-v1",
      "aws_access_key_id": "",
      "aws_secret_access_key": ""
    }
  }
]'

ElevenLabs User Simulator

The user simulator is an ElevenLabs Agent with the following configuration:

Parameter Value
LLM GPT-4.1
Voice (female) Natalee Champlin
Voice (male) Eric
Input audio μ-law telephony, 8000 Hz
Turn detection silence 15ms
Max conversation duration 600s
Interruptions Disabled
First message None (agent speaks first)
Default personality Disabled
Tools end_call (user ends the call once task is complete or conversation cannot be advanced)

The simulator is prompted with a specific user goal and is instructed to stay on task, communicate all required named entities clearly, and terminate when the goal is accomplished or the task is clearly unlikely to succeed.

The ElevenLabs agent also has the end_call tool enabled which it allows it to end the call. The description of the end_call tool, which is provided to the agent, is shown below.

Use this to end the phone call and hang up.

Call this function when any ONE of the following is true:
1. The agent has confirmed your request is resolved and you have said goodbye
2. The agent says they are transferring you to a live agent
3. The agent has been unable to make progress for at least 5 consecutive turns
4. The agent says goodbye or indicates the conversation is over
5. The agent indicates that the remainder of your request cannot be fulfilled.
6. If the assistant says something along the lines of "I'm sorry I encountered an error processing your request."

Before calling this tool, always say a brief goodbye first.