openshift · codekow · Feb 16, 2026
diff --git a/alerts/cluster-etcd-operator/etcdHighFsyncDurations.md b/alerts/cluster-etcd-operator/etcdHighFsyncDurations.md
@@ -39,7 +39,7 @@ You can find more performance troubleshooting tips in
 
 In the OpenShift dashboard console under Observe section, select the etcd
 dashboard. There are both leader elections as well as Disk Sync Duration
-dashboards which will assit with further issues.
+dashboards which will assist with further issues.
 
 ## Mitigation
 

diff --git a/alerts/cluster-logging-operator/ClusterLogForwarderOutputErrorRate.md b/alerts/cluster-logging-operator/ClusterLogForwarderOutputErrorRate.md
@@ -63,7 +63,7 @@ credentials.
 
 ### TLS Certificate Update
 
-If the issue stems from incorrect or expired certicates, update the associated
+If the issue stems from incorrect or expired certificates, update the associated
 OpenShift `Secret` or `ConfigMap` with the correct and valid certificates.
 
 ## Notes

diff --git a/alerts/cluster-monitoring-operator/KubePersistentVolumeInodesFillingUp.md b/alerts/cluster-monitoring-operator/KubePersistentVolumeInodesFillingUp.md
@@ -10,7 +10,7 @@ with `openshift-` or `kube-`.
 ## Impact
 
 Significant inode usage by a system component is likely to prevent the
-component from functioning normally. Signficant inode usage can also lead to a
+component from functioning normally. Significant inode usage can also lead to a
 partial or full cluster outage.
 
 ## Diagnosis

diff --git a/alerts/cluster-monitoring-operator/NodeClockNotSynchronising.md b/alerts/cluster-monitoring-operator/NodeClockNotSynchronising.md
diff --git a/alerts/cluster-monitoring-operator/NodeClockNotSynchronising.md b/alerts/cluster-monitoring-operator/NodeClockNotSynchronising.md
@@ -0,0 +1 @@
+NodeClockNotSynchronizing.md
diff --git a/alerts/cluster-monitoring-operator/NodeClockNotSynchronizing.md b/alerts/cluster-monitoring-operator/NodeClockNotSynchronizing.md
@@ -0,0 +1,67 @@
+# NodeClockNotSynchronizing
+
+## Meaning
+
+The `NodeClockNotSynchronizing` alert triggers when a node is affected by
+issues with the NTP server for that node. For example, this alert might trigger
+when certificates are rotated for the API Server on a node, and the
+certificates fail validation because of an invalid time.
+
+
+## Impact
+This alert is critical. It indicates an issue that can lead to the API Server
+Operator becoming degraded or unavailable. If the API Server Operator becomes
+degraded or unavailable, this issue can negatively affect other Operators, such
+as the Cluster Monitoring Operator.
+
+## Diagnosis
+
+To diagnose the underlying issue, start a debug pod on the affected node and
+check the `chronyd` service:
+
+```shell
+oc -n default debug node/<affected_node_name>
+chroot /host
+systemctl status chronyd
+```
+
+## Mitigation
+
+1. If the `chronyd` service is failing or stopped, start it:
+
+    ```shell
+    systemctl start chronyd
+    ```
+    If the chronyd service is ready, restart it
+
+    ```shell
+    systemctl restart chronyd
+    ```
+
+    If `chronyd` starts or restarts successfully, the service adjusts the clock
+    and displays something similar to the following example output:
+
+    ```shell
+    Oct 18 19:39:36 ip-100-67-47-86 chronyd[2055318]: System clock wrong by 16422.107473 seconds, adjustment started
+    Oct 19 00:13:18 ip-100-67-47-86 chronyd[2055318]: System clock was stepped by 16422.107473 seconds
+    ```
+
+2. Verify that the `chronyd` service is running:
+
+    ```shell
+    systemctl status chronyd
+    ```
+
+3. Verify using PromQL:
+
+    ```console
+    min_over_time(node_timex_sync_status[5m])
+    node_timex_maxerror_seconds
+    ```
+    `node_timex_sync_status` returns `1` if NTP is working properly,or `0` if
+    NTP is not working properly. `node_timex_maxerror_seconds` indicates how
+    many seconds NTP is falling behind.
+
+    The alert triggers when the value for
+    `min_over_time(node_timex_sync_status[5m])` equals `0` and the value for
+    `node_timex_maxerror_seconds` is greater than or equal to `16`.
diff --git a/alerts/cluster-network-operator/NorthboundStaleAlert.md b/alerts/cluster-network-operator/NorthboundStaleAlert.md
@@ -26,7 +26,7 @@ hierarchy](./hierarchy/alerts-hierarchy.svg)
 
 Investigate the health of the affected ovnkube-controller or northbound database
 processes that run in the `ovnkube-controller` and `nbdb` containers
-repectively.
+respectively.
 
 For OCP clusters at versions 4.13 or earlier, the containers run in
 ovnkube-master pods:

diff --git a/alerts/cluster-network-operator/OVNKubernetesNorthdInactive.md b/alerts/cluster-network-operator/OVNKubernetesNorthdInactive.md
@@ -61,13 +61,13 @@ The result should be Status:active
 
 Mitigation will depend on what was found in the diagnosis section.
 
-As a general fix, you can try exiting the affected ovn-northd procesess with
+As a general fix, you can try exiting the affected ovn-northd processes with
 ```shell
 ovn-appctl -t ovn-northd exit
 ```
 which should cause the container running northd to restart. If this does not
-work you can try restarting the pods where the affected ovn-northd procesess are
+work you can try restarting the pods where the affected ovn-northd processes are
 running.
 
-Contact the incident response team in your organisation if fixing the issue is
+Contact the incident response team in your organization if fixing the issue is
 not apparent.
diff --git a/alerts/cluster-network-operator/SouthboundStaleAlert.md b/alerts/cluster-network-operator/SouthboundStaleAlert.md
@@ -25,7 +25,7 @@ hierarchy](./hierarchy/alerts-hierarchy.svg)
 ## Diagnosis
 
 Investigate the health of the affected northd or southbound database processes
-that run in the `northd` and `sbdb` containers repectively.
+that run in the `northd` and `sbdb` containers respectively.
 
 For OCP clusters at versions 4.13 or earlier, the containers run in
 ovnkube-master pods:

diff --git a/alerts/machine-config-operator/HighOverallControlPlaneMemory.md b/alerts/machine-config-operator/HighOverallControlPlaneMemory.md
@@ -11,7 +11,7 @@ threshold for 1 hour, the alert will fire.
 ## Impact
 The memory usage per instance within control
 plane nodes influences the stability
-and responsiveness of the cluster, most noticably in the etcd and
+and responsiveness of the cluster, most noticeably in the etcd and
 Kubernetes API server pods. Moreover, OOM kill can occur
 with excessive memory usage, which negatively
 influences the pod scheduling. Etcd also relies on a certain number of

diff --git a/alerts/machine-config-operator/MachineConfigDaemonPivotError.md b/alerts/machine-config-operator/MachineConfigDaemonPivotError.md
@@ -31,12 +31,12 @@ pod logs for the cluster.
 
 For the following command, replace the $DAEMONPOD variable
 with the name of your own machine-config-daemon-* pod name.
-That is scheduled on the node expriencing the error.
+That is scheduled on the node experiencing the error.
 
 ```console
 oc logs -f -n openshift-machine-config-operator $DAEMONPOD -c machine-config-daemon
 ```
-When a pivot is occuring the following will be logged.
+When a pivot is occurring the following will be logged.
 
 ```console
 I1126 17:15:38.991090    3069 rpm-ostree.go:243] Executing rebase to quay.io/my-registry/custom-image@blah
@@ -67,7 +67,7 @@ stated reason it gives for not being able to pivot. The following are
 common reasons a pivot can fail.
 
 - The rpm-ostree service is unable to
-pull the image from quay succesfully.
+pull the image from quay successfully.
 - There are issues with the rpm-ostree service itself such as
 being unable to start, or unable to build the OsImage folder,
 unable to pivot from the current configuration.

diff --git a/alerts/machine-config-operator/MachineConfigDaemonRebootError.md b/alerts/machine-config-operator/MachineConfigDaemonRebootError.md
@@ -10,7 +10,7 @@ will fire.
 
 ## Impact
 
-If the MCD is unable to succesfully reboot the node,
+If the MCD is unable to successfully reboot the node,
 any pending MachineConfig changes that would
 require a reboot would not be propagated,
 and the MachineConfig cluster operator would degrade.
@@ -71,7 +71,7 @@ update.go:2641] failed to run reboot: exec: "systemd-run": executable file not f
 
 This error indicates that the `systemd-run` file cannot be
 found in the /usr/bin/systemd-run $PATH and so the node
-cannot reboot succesfully.
+cannot reboot successfully.
 
 The error message will change depending on what is
 preventing the reboot.

diff --git a/alerts/machine-config-operator/SystemMemoryExceedsReservation.md b/alerts/machine-config-operator/SystemMemoryExceedsReservation.md
@@ -19,7 +19,7 @@ The system daemons needs this memory in order to
 run and satisfy system processes. If other workloads
 start to use this memory then system daemons
 can be impacted. This alert
-firing does not nessarily mean the node is
+firing does not necessarily mean the node is
 resource exhausted at the moment.
 
 ## Diagnosis
@@ -53,7 +53,7 @@ to get the 95th percentile.
 portion of the system's memory occupied by
 a process that is held in the main memory)
 
-  If this value is greather then the 95th
+  If this value is greater then the 95th
   percentile of the allocatable memory for
   the node then the alert will go into pending.
   After 15 minutes in this state the alert
@@ -120,7 +120,7 @@ useful for troubleshooting:
 
 - You can use the `top` command on
 the host to get a dynamic update of
-the largest memory consuming proccesses.
+the largest memory consuming processes.
 For instance, to get the top 100 memory
 consuming processes on a node.
 
@@ -137,7 +137,7 @@ statistics of the node.
 - Each node also contains a file called
 `/proc/meminfo`. This file provides a usage
 report about memory on the system. You can
-learn how to interperet the fields [here](https://access.redhat.com/solutions/406773).
+learn how to interpret the fields [here](https://access.redhat.com/solutions/406773).
 
 - For kubelet-level commands you can get
 the memory usage of individual pods by

diff --git a/alerts/openshift-container-storage-operator/CephClusterCriticallyFull.md b/alerts/openshift-container-storage-operator/CephClusterCriticallyFull.md
@@ -12,7 +12,7 @@ Storage cluster will become read-only at 85%.
 
 ## Diagnosis
 
-Using the Openshift console, go to Storage-Data Fountation-Storage systems.
+Using the Openshift console, go to Storage-Data Foundation-Storage systems.
 A list of the available storage systems with basic information about raw
 capacity and used capacity will be visible.
 The command "ceph health" provides also information about cluster storage

diff --git a/alerts/openshift-container-storage-operator/CephClusterNearFull.md b/alerts/openshift-container-storage-operator/CephClusterNearFull.md
@@ -11,7 +11,7 @@ Storage cluster will become read-only at 85%.
 
 ## Diagnosis
 
-Using the Openshift console, go to Storage-Data Fountation-Storage systems.
+Using the Openshift console, go to Storage-Data Foundation-Storage systems.
 A list of the available storage systems with basic information about raw
 capacity and used capacity will be visible.
 The command "ceph health" provides also information about cluster storage

diff --git a/alerts/openshift-container-storage-operator/CephClusterReadOnly.md b/alerts/openshift-container-storage-operator/CephClusterReadOnly.md
@@ -13,7 +13,7 @@ Storage cluster will become read-only at 85%.
 
 ## Diagnosis
 
-Using the Openshift console, go to Storage-Data Fountation-Storage systems.
+Using the Openshift console, go to Storage-Data Foundation-Storage systems.
 A list of the available storage systems with basic information about raw
 capacity and used capacity will be visible.
 The command "ceph health" provides also information about cluster storage

diff --git a/...openshift-container-storage-operator/CephMdsCPUUsageHighNeedsVerticalScaling.md b/...openshift-container-storage-operator/CephMdsCPUUsageHighNeedsVerticalScaling.md
@@ -37,7 +37,7 @@ oc patch -n openshift-storage storagecluster ocs-storagecluster \
 ```
 Above is a sample patch command, user need to see their current CPU
 configurations and increase accordingly
-PS: It is always adviced to add another MDS pod (that is to scale
+PS: It is always advised to add another MDS pod (that is to scale
 Horizontally) once we have reached the max resource limit. Please see
 [HorizontalScaling](CephMdsCPUUsageHighNeedsHorizontalScaling.md)
 documentation for more details.

diff --git a/alerts/openshift-container-storage-operator/CephMdsCacheUsageHigh.md b/alerts/openshift-container-storage-operator/CephMdsCacheUsageHigh.md
@@ -21,7 +21,7 @@ the cache limit set in `mds_cache_memory_limit`.
 The MDS tries to stay under a reservation of the `mds_cache_memory_limit` by
 trimming unused metadata in its cache and recalling cached items in the client
 caches. It is possible for the MDS to exceed this limit due to slow recall from
-clients as result of multiple clients accesing the files.
+clients as result of multiple clients accessing the files.
 
 Read more about ceph MDS cache configuration [here](https://docs.ceph.com/en/latest/cephfs/cache-configuration/?highlight=mds%20cache%20configuration#mds-cache-configuration)
 

diff --git a/alerts/openshift-container-storage-operator/CephMdsMissingReplicas.md b/alerts/openshift-container-storage-operator/CephMdsMissingReplicas.md
@@ -14,11 +14,11 @@ be fixed as soon as possible.
 ## Diagnosis
 
 Make sure we have enough RAM provisioned for MDS Cache. Default is 4GB, but
-recomended is minimum 8GB.
+recommended is minimum 8GB.
 
 ## Mitigation
 
-It is highly recomended to distribute MDS daemons across at least two nodes in
+It is highly recommended to distribute MDS daemons across at least two nodes in
 the cluster. Otherwise, a hardware failure on a single node may result in the
 file system becoming unavailable.
 
diff --git a/alerts/openshift-container-storage-operator/CephMonLowNumber.md b/alerts/openshift-container-storage-operator/CephMonLowNumber.md
@@ -11,7 +11,7 @@ are only 3 monitors.
 
 This a "info" level alert, and therefore just a suggestion.
 The alert is just suggesting to increase the number of ceph monitors, to be
-more resistent to failures.
+more resistant to failures.
 It can be silenced without any impact in the cluster functionality or
 performance.
 If the number of monitors is increased to 5, the cluster will be more robust.

diff --git a/...s/openshift-container-storage-operator/CephPoolQuotaBytesCriticallyExhausted.md b/...s/openshift-container-storage-operator/CephPoolQuotaBytesCriticallyExhausted.md
@@ -10,7 +10,7 @@ One threshold that can trigger this warning condition is the
 ## Impact
 
 Due the quota configured the pool will become readonly when the quota will be
-exhausted completelly
+exhausted completely
 
 ## Diagnosis
 

diff --git a/alerts/openshift-container-storage-operator/CephPoolQuotaBytesNearExhaustion.md b/alerts/openshift-container-storage-operator/CephPoolQuotaBytesNearExhaustion.md
@@ -10,7 +10,7 @@ One threshold that can trigger this warning condition is the
 ## Impact
 
 Due the quota configured the pool will become readonly when the quota will be
-exhausted completelly
+exhausted completely
 
 ## Diagnosis
 

diff --git a/alerts/openshift-container-storage-operator/KMSServerConnectionAlert.md b/alerts/openshift-container-storage-operator/KMSServerConnectionAlert.md
@@ -17,7 +17,7 @@ Connection with external key management service is not working.
 
 ## Mitigation
 
-Review configuration values in the ´ocs-kms-connection-details´ confimap.
+Review configuration values in the ´ocs-kms-connection-details´ configmap.
 
 Verify the connectivity with the external KMS, verifying
 [network connectivity](helpers/networkConnectivity.md)