when running ml-intern through CLI with llama-server being hit, it doesn't automatically do compaction. instead, llama-server has something called --context-shift which is by default disabled and is ran on server side. here's the command I used:
llama-server --model [path-to-gguf] --host 0.0.0.0 --port 8081 --n-gpu-layers 99 --ctx-size 262144 --context-shift
would be nice to enable this for all local servers if we can do it on our end @lewtun
when running ml-intern through CLI with llama-server being hit, it doesn't automatically do compaction. instead, llama-server has something called --context-shift which is by default disabled and is ran on server side. here's the command I used:
would be nice to enable this for all local servers if we can do it on our end @lewtun