diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 42f89778..0b0ed5b2 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -18,7 +18,7 @@ To collect logs from the AMD GPU Operator: kubectl logs -n kube-amd-gpu ``` -## Potential Issues with default ``DeviceConfig`` +## Potential Issues with ``DeviceConfig`` * Please refer to {ref}`typical-deployment-scenarios` for more information and get corresponding ```helm install``` commands and configs that fits your specific use case. @@ -32,6 +32,31 @@ kubectl logs -n kube-amd-gpu kubectl edit deviceconfigs -n kube-amd-gpu default ``` +* Verify that the DeviceConfig has been applied successfully across all nodes by checking its status. Any configuration issues (such as field validation errors) will be reported in the status section with the `OperatorReady` condition set to `False`. Use the following command to view the status: + +```bash +kubectl get deviceconfigs -n kube-amd-gpu default -o yaml +``` + +```yaml +status: + conditions: + - lastTransitionTime: "2026-03-10T09:56:53Z" + message: "" + reason: OperatorReady + status: "True" + type: Ready + devicePlugin: + availableNumber: 1 + desiredNumber: 1 + nodesMatchingSelectorNumber: 1 + metricsExporter: + availableNumber: 1 + desiredNumber: 1 + nodesMatchingSelectorNumber: 1 + observedGeneration: 1 +``` + ## Debugging Driver Installation If the AMD GPU driver build fails: