Skip to content

llama-server context-shift #259

@merveenoyan

Description

@merveenoyan

when running ml-intern through CLI with llama-server being hit, it doesn't automatically do compaction. instead, llama-server has something called --context-shift which is by default disabled and is ran on server side. here's the command I used:

llama-server --model [path-to-gguf] --host 0.0.0.0 --port 8081 --n-gpu-layers 99 --ctx-size 262144 --context-shift

would be nice to enable this for all local servers if we can do it on our end @lewtun

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions