Skip to content

[BUG] I have pulled the docker images,but when I run it ,I got errors. The errors suggest the images does not support AMD gpu. #68

@sunpian1

Description

@sunpian1

susie.sun@yz-amd1:~$ docker run -it rocm/deepspeed:rocm5.7_ubuntu20.04_py3.9_pytorch_2.0.1_DeepSpeed /bin/bash
root@c50e90963e1a:/var/lib/jenkins# deepspeed --num_gpus 1 deploy.py
[2023-12-14 01:52:04,385] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-12-14 01:52:05,180] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Traceback (most recent call last):
File "/opt/conda/envs/py_3.9/bin/deepspeed", line 6, in
main()
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/deepspeed/launcher/runner.py", line 422, in main
raise RuntimeError("Unable to proceed, no GPU resources available")
RuntimeError: Unable to proceed, no GPU resources available

our AMD gpu is AMD Radeon™ RX 7900 XTX

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions