Skip to content

CASMTRIAGE-9128: Update IUF docs for procedure to follow after each batch of worker nodes rollout w.r.t iSCSI#6552

Open
ravikanth-nalla-hpe wants to merge 5 commits into
release/1.7from
CASMTRIAGE-9128-iscsi-sbps
Open

CASMTRIAGE-9128: Update IUF docs for procedure to follow after each batch of worker nodes rollout w.r.t iSCSI#6552
ravikanth-nalla-hpe wants to merge 5 commits into
release/1.7from
CASMTRIAGE-9128-iscsi-sbps

Conversation

@ravikanth-nalla-hpe
Copy link
Copy Markdown
Contributor

Description

On Upgrade CSM 25.9.0 (1.7.0) to CSM 26.3.0 (1.7.1) on Creek system:

After all worker nodes management rollouts complete, observed SQUASHFS errors and "LUN assignments on this target have changed" messages from the compute nodes dmesg log and also "Detected NON_EXISTENT_LUN Access" messages observed from the worker nodes. Compute nodes look to be frozen due to the flood of the messages and commands are failing.

Resolution

Update IUF management rollouts (for workers) section to validate if the issue can occur and apply the CASMTRIAGE-9129 procedure as a preventive action to avoid the issue (compute nodes freezing/ unresponsive).

Relates to:
CASMTRIAGE-9122[CASMTRIAGE-9122] - Parent JIRA
CASMTRIAGE-9128 - Current PR ref JIRA
CASMTRIAGE-9129 - iSCSI SBPS procedure

ravikanth-nalla-hpe and others added 5 commits April 23, 2026 09:37
…atch of worker nodes rollout w.r.t iSCSI SBPS

- initial placeholder commit
Update management_rollout.md for iSCSI SBPS

Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>
Signed-off-by: ravikanth-nalla-hpe <140072234+ravikanth-nalla-hpe@users.noreply.github.com>

1. Invoke `iuf run` with `-r` to execute the [`management-nodes-rollout`](../stages/management_nodes_rollout.md) stage on `ncn-m001`. This will rebuild `ncn-m001` with the new CFS configuration and image built in
previous steps of the workflow.
1. Invoke `iuf run` with `-r` to execute the [`management-nodes-rollout`](../stages/management_nodes_rollout.md) stage on `ncn-m001`. This will rebuild `ncn-m001` with the new CFS configuration and image built in previous steps of the workflow.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some space realignment can you please check?


1. (`ncn-nid#`) Verify whether the following messages are observed on any compute or UAN node.

```bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be a script which can run across all computes and UANs?

TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000001 from iqn.2023-06.csm.iscsi:x 1000c1s1b1n0
```

If above respective messages are encountered on the canary worker and any compute/UAN nodes, ensure the following procedure is followed before continuing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be seen on both workers and computes???

TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000001 from iqn.2023-06.csm.iscsi:x 1000c1s1b1n0
```

If above respective messages are encountered on the canary worker and any compute/UAN nodes, ensure the following procedure is followed before continuing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewording is needed based on the answer to above question.


```bash
dmesg | grep "Detected NON_EXISTENT_LUN Access"
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message will be seen during boot time of worker node initially. So can't rely on this unless its persists right ?

sd 2:0:0:12: LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.
sd 2:0:0:7: LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.
sd 2:0:0:9: LUN assignments on this target have changed. The Linux SCSI layer does not automatically remap LUN assignments.
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not right. These messages are seen on iSCSI initiator nodes. Not on iSCSI target. These messages will be seen upon any asynchronous scan or 'iscsiadm rescan' command as well. We can't rely on this. We need to verify 'multipath -ll' command output on initiator nodes and see if the paths are lost right ? Pls mention the symptom seen in CASMTRIAGE-9122. Or else this can also be mentioned as pre-requisite step to avoid this issue as well as other issues that we have seen with iSCSI during upgrade.

TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000001 from iqn.2023-06.csm.iscsi:x 1000c1s1b1n0
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000001 from iqn.2023-06.csm.iscsi:x 1000c1s1b1n0
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000001 from iqn.2023-06.csm.iscsi:x 1000c1s1b1n0
TARGET_CORE[iSCSI]: Detected NON_EXISTENT_LUN Access for 0x00000001 from iqn.2023-06.csm.iscsi:x 1000c1s1b1n0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again same comment as first comment. Can't rely on this. Pls mention this as pre-requisite step during rollouts.

Copy link
Copy Markdown
Contributor

@aasha-hpe aasha-hpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check my comments made. Also, Pls address Markdown Spell check errors.

@github-actions
Copy link
Copy Markdown
Contributor

This pull-request has not had activity in over 20 days and is being marked as stale.

@github-actions github-actions Bot added the Stale Hasn't had activity in over 30 days label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale Hasn't had activity in over 30 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants