Skip to content

[Bug]: Multiple system messages break models with strict chat templates (e.g. Qwen3.5) #2894

@stefanritterhoff

Description

@stefanritterhoff

Bug Description

Forge sends two separate system role messages in the chat completion request — one for the static agent prompt and one for the non-static agent template. This is produced by set_system_messages() in crates/forge_domain/src/context.rs, which inserts each entry from the Vec<S> as a separate ContextMessage::system() at the front of the message array.

Many model chat templates (including Qwen3.5, and potentially others) enforce that system messages must only appear as the first message. The Qwen3.5 Jinja template specifically raises raise_exception('System message must be at the beginning.') when a system message appears at any position other than messages[0]. Since Forge inserts two system messages, messages[1] triggers this error and the entire request fails with a 500 from llama.cpp's server.

This affects any provider/model with a chat template that restricts system messages to position 0.

Error output:

ERROR: POST http://localhost:8080/v1/chat/completions

Caused by:
    0: 500 Internal Server Error Reason: {"error":{"code":500,"message":"\n------------\nWhile executing CallExpression at line 85, column 32 in source:\n...first %}↵            {{- raise_exception('System message must be at the beginnin...\n                                           ^\nError: Jinja Exception: System message must be at the beginning.","type":"server_error"}}

Request body Forge sends:

{
  "messages": [
    { "role": "system", "content": "<static agent prompt>" },
    { "role": "system", "content": "<non-static agent template>" },
    { "role": "user", "content": "..." },
    ...
  ]
}

Root cause:

Context::set_system_messages() in crates/forge_domain/src/context.rs:

pub fn set_system_messages<S: Into<String>>(mut self, content: Vec<S>) -> Self {
    if self.messages.is_empty() {
        for message in content {
            self.messages.push(ContextMessage::system(message.into()).into());
        }
        self
    } else {
        self.messages.retain(|m| !m.has_role(Role::System));
        for message in content.into_iter().rev() {
            self.messages.insert(0, ContextMessage::system(message.into()).into());
        }
        self
    }
}

This is called from crates/forge_app/src/system_prompt.rs with a Vec of two strings (static_block, non_static_block), resulting in two separate system messages.

Suggested fix:

Join all system message content into a single message:

pub fn set_system_messages<S: Into<String>>(mut self, content: Vec<S>) -> Self {
    let combined: String = content
        .into_iter()
        .map(|s| s.into())
        .collect::<Vec<String>>()
        .join("\n\n");

    if combined.is_empty() {
        return self;
    }

    self.messages.retain(|m| !m.has_role(Role::System));
    self.messages.insert(0, ContextMessage::system(combined).into());
    self
}

All existing tests in forge_domain pass with this change.

Environment:

  • Provider: llama.cpp (OpenAI-compatible API, response_type = "OpenAI")
  • Models affected: Qwen3.5 family, and any other model with a chat template that restricts system messages to position 0

Steps to Reproduce

  1. Start a llama.cpp server with any Qwen3.5 model:

    llama-server -m Qwen3.5-0.8B-IQ4_NL.gguf --port 8080
    
  2. Configure Forge with a custom OpenAI-compatible provider (e.g. ~/forge/provider.json):

    [{
      "id": "llama_cpp",
      "url": "http://localhost:8080/v1/chat/completions",
      "response_type": "OpenAI",
      "models": [{ "id": "Qwen3.5-0.8B-IQ4_NL", "tools_supported": true }]
    }]
  3. Set the model in .forge.toml:

    [session]
    model = "Qwen3.5-0.8B-IQ4_NL"
    provider_id = "llama_cpp"
  4. Run forge and send any message:

    forge
    > hello
    
  5. Observe the 500 error from the server about "System message must be at the beginning."

Expected Behavior

Forge should send a single system message containing both the static and non-static prompt content joined together. The request body should look like:

{
  "messages": [
    { "role": "system", "content": "<static agent prompt>\n\n<non-static agent template>" },
    { "role": "user", "content": "..." },
    ...
  ]
}

This ensures compatibility with all chat templates, including those that enforce a single system message at position 0.

Actual Behavior

Forge sends two separate system messages, causing the request to fail:

{
  "messages": [
    { "role": "system", "content": "<static agent prompt>" },
    { "role": "system", "content": "<non-static agent template>" },
    { "role": "user", "content": "..." },
    ...
  ]
}

The second system message at index 1 triggers the chat template validation, resulting in:

ERROR: POST http://localhost:8080/v1/chat/completions

Caused by:
    0: 500 Internal Server Error Reason: {"error":{"code":500,"message":"Error: Jinja Exception: System message must be at the beginning.","type":"server_error"}}

Forge Version

2.8.0

Operating System & Version

No response

AI Provider

Other

Model

any qwen3.5 model

Installation Method

Other

Configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugSomething isn't working.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions