From 36f63da91d20e855b792bf71d8aa7935a45d0dec Mon Sep 17 00:00:00 2001 From: Uday Bhaskar Date: Thu, 12 Mar 2026 22:48:57 +0530 Subject: [PATCH] update troubleshooting documentation section (#1190) (cherry picked from commit 27104e1c51169f216e694179470f27f03bfff5ef) --- docs/troubleshooting.md | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 42f897783..0b0ed5b23 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -18,7 +18,7 @@ To collect logs from the AMD GPU Operator: kubectl logs -n kube-amd-gpu ``` -## Potential Issues with default ``DeviceConfig`` +## Potential Issues with ``DeviceConfig`` * Please refer to {ref}`typical-deployment-scenarios` for more information and get corresponding ```helm install``` commands and configs that fits your specific use case. @@ -32,6 +32,31 @@ kubectl logs -n kube-amd-gpu kubectl edit deviceconfigs -n kube-amd-gpu default ``` +* Verify that the DeviceConfig has been applied successfully across all nodes by checking its status. Any configuration issues (such as field validation errors) will be reported in the status section with the `OperatorReady` condition set to `False`. Use the following command to view the status: + +```bash +kubectl get deviceconfigs -n kube-amd-gpu default -o yaml +``` + +```yaml +status: + conditions: + - lastTransitionTime: "2026-03-10T09:56:53Z" + message: "" + reason: OperatorReady + status: "True" + type: Ready + devicePlugin: + availableNumber: 1 + desiredNumber: 1 + nodesMatchingSelectorNumber: 1 + metricsExporter: + availableNumber: 1 + desiredNumber: 1 + nodesMatchingSelectorNumber: 1 + observedGeneration: 1 +``` + ## Debugging Driver Installation If the AMD GPU driver build fails: