Repository for running LLMs via docker, especially on machines without access to the Internet (for example, remote server, only accessible via local network or a direct USB drive)
Install docker on system without internet: for example, follow the instructions on official website
Go to the root of this repository and run:
docker build --progress=plain -t fips-llm .You may have to try "sudo docker" for this to succeed. Be warned that this will use huge amount of traffic
You need to rerun this only if you change one of the:
- Dockerfile
- environment.yaml
- requirements.txt
On a machine with access to the internet:
docker run --rm -v ./src/code:/app/run -v ./src/model_cache:/app/model_cache -v ./src/huggingface:/root/.cache/huggingface -p 8080:8080 --gpus all fips-llm python -u -c 'from ai import Models; print(Models.process(model_name='\''Qwen/Qwen3-0.6B'\'', max_new_tokens=32768, prompt='\''Answer shortly: what is 2+2*2?'\''))'On a machine without the internet:
docker run --rm -v ./src/code:/app/run -v ./src/model_cache:/app/model_cache -v ./src/huggingface:/root/.cache/huggingface -p 8080:8080 --gpus all fips-llm /bin/bash -c "HF_HUB_OFFLINE=1 python -u -c 'from ai import Models; print(Models.process(model_name='\''Qwen/Qwen3-0.6B'\'', max_new_tokens=32768, prompt='\''Answer shortly: what is 2+2*2?'\''))'"To run just a web server:
docker run --rm -v ./src/code:/app/run -v ./src/model_cache:/app/model_cache -v ./src/huggingface:/root/.cache/huggingface -p 8080:8080 --gpus all fips-llm /bin/bash -c "HF_HUB_OFFLINE=1 uvicorn main:app --host 0.0.0.0 --port 8080"On Windows you might need to change ./src to .\src and ./src/huggingface to .\src\huggingface like this:
docker run --rm -v .\src\code:/app/run -v .\src\model_cache:/app/model_cache -v .\src\huggingface:/root/.cache/huggingface -p 8080:8080 --gpus all fips-llm python -u -c 'from ai import Models; print(Models.process(model_name="""Qwen/Qwen3-0.6B""", max_new_tokens=32768, prompt="""Answer shortly: what is 2+2*2?"""))'If running without specific commands, this will launch a web server for interactive use. Go to http://localhost:8080
You need to rerun this only if you change main.py AND this change downloads something in one of the volumes (like src/model_cache/ for models or huggingface/ for transformers cache)
If you have run docker build this iteration - save docker image:
docker save -o fips-llm.tar fips-llmtransfer the fips-llm.tar file and then load it on the target machine:
docker load -i fips-llm.tarOnly transfer this folder if volumes were changed. You don't need to rebuild docker image each time volumes change
Only transfer this if the main.py was changed. You don't need to rebuild docker image each time main.py changes
You could just package and deliver the whole src/ folder, but it will be very large in size, so choose wisely
-
Use https://transfer.it/start for transferring huge files and folders between computers. BEWARE that this doesn't preserve symlinks
-
To preserve symlinks, use archives to save folder:
tar --preserve-permissions -czvf src.tar.gz src/
and to extract that later on the remote machine:
tar -xzvf src.tar.gz