Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions HACKING.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@

2. Create images as documented below

## Overriding diskmaker image or operator image directly after deployment

1. Delete the subscription object so as OLM is no longer managing the operator.
2. Edit the CSV object to override diskmaker image.
Comment thread
gnufied marked this conversation as resolved.

## Running the operator locally

1. Install LSO via OLM/OperatorHub in GUI
Expand Down
9 changes: 6 additions & 3 deletions api/v1/localvolumedevicelink_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,13 +79,12 @@ type LocalVolumeDeviceLinkStatus struct {
// preferredLinkTarget is the /dev/disk/by-id symlink for the device that the local storage
// operator evaluated as the most stable and the least error prone. The local storage operator
// recommends using this symlink, after a careful review by the cluster admin.
// +required
// +kubebuilder:validation:MinLength=1
// +optional
// +kubebuilder:validation:MaxLength=4096
PreferredLinkTarget string `json:"preferredLinkTarget,omitempty"`
Comment on lines +82 to 84
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a pointer, if it's allowed to be empty?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to - https://github.com/openshift/enhancements/blob/master/dev-guide/api-conventions.md

"In custom resource based APIs specifically, we advise to avoid making fields pointers unless there is an absolute need to do so. An absolute need being the need to distinguish between the zero value and an unset value", it should be fine to keep it optional and still a string type i think.

// validLinkTargets is the list of /dev/disk/by-id symlinks for the device that the local
// storage operator considers as valid. The list may contain at most 256 entries.
// +required
// +optional
// +listType=set
// +kubebuilder:validation:MaxItems=256
ValidLinkTargets []string `json:"validLinkTargets,omitempty"`
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to relax these validations, because I realized that in some environments, we may not get any preferredLink targers in /dev/disk/by-id at all... So rather than not creating any objects in those environments, we still create LVDL object

Expand Down Expand Up @@ -113,6 +112,10 @@ type LocalVolumeDeviceLinkList struct {
Items []LocalVolumeDeviceLink `json:"items"`
}

func init() {
SchemeBuilder.Register(&LocalVolumeDeviceLink{}, &LocalVolumeDeviceLinkList{})
}

// DeviceLinkPolicy defines how symlinks for given volumes should be treated
// by the LSO. Valid values are - None, CurrentLinkTarget and PreferredLinkTarget
// +kubebuilder:validation:Enum=None;CurrentLinkTarget;PreferredLinkTarget
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,8 @@ spec:
- localvolumediscoveries/status
- localvolumediscoveryresults
- localvolumediscoveryresults/status
- localvolumedevicelinks
- localvolumedevicelinks/status
verbs:
- get
- list
Expand Down Expand Up @@ -313,6 +315,8 @@ spec:
- localvolumediscoveries/status
- localvolumediscoveryresults
- localvolumediscoveryresults/status
- localvolumedevicelinks
- localvolumedevicelinks/status
verbs:
- get
- list
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,6 @@ spec:
operator evaluated as the most stable and the least error prone. The local storage operator
recommends using this symlink, after a careful review by the cluster admin.
maxLength: 4096
minLength: 1
type: string
validLinkTargets:
description: |-
Expand All @@ -144,8 +143,6 @@ spec:
x-kubernetes-list-type: set
required:
- currentLinkTarget
- preferredLinkTarget
- validLinkTargets
type: object
required:
- spec
Expand Down
83 changes: 57 additions & 26 deletions pkg/common/provisioner_utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"path/filepath"

localv1 "github.com/openshift/local-storage-operator/api/v1"
"github.com/openshift/local-storage-operator/pkg/internal"
corev1 "k8s.io/api/core/v1"
storagev1 "k8s.io/api/storage/v1"
"k8s.io/apimachinery/pkg/api/meta"
Expand Down Expand Up @@ -42,19 +43,35 @@ func GenerateMountMap(runtimeConfig *provCommon.RuntimeConfig) (sets.String, err
return mountPointMap, nil
}

// CreateLocalPVArgs holds the arguments for CreateLocalPV.
type CreateLocalPVArgs struct {
LocalVolumeLikeObject runtime.Object
RuntimeConfig *provCommon.RuntimeConfig
StorageClass storagev1.StorageClass
MountPointMap sets.String
Client client.Client
SymLinkPath string
IDExists bool
ExtraLabelsForPV map[string]string

// CurrentSymlink points to source to which SymLinkPath points to.
CurrentSymlink string
// BlockDevice is the block device backing this PV.
BlockDevice internal.BlockDevice
}

// CreateLocalPV is used to create a local PV against a symlink
// after passing the same validations against that symlink that local-static-provisioner uses
func CreateLocalPV(
obj runtime.Object,
runtimeConfig *provCommon.RuntimeConfig,
storageClass storagev1.StorageClass,
mountPointMap sets.String,
client client.Client,
symLinkPath string,
deviceName string,
idExists bool,
extraLabelsForPV map[string]string,
) error {
func CreateLocalPV(ctx context.Context, args CreateLocalPVArgs) error {
obj := args.LocalVolumeLikeObject
runtimeConfig := args.RuntimeConfig
storageClass := args.StorageClass
mountPointMap := args.MountPointMap
client := args.Client
symLinkPath := args.SymLinkPath
deviceName := args.BlockDevice.KName
idExists := args.IDExists
extraLabelsForPV := args.ExtraLabelsForPV
nodeLabels := runtimeConfig.Node.GetLabels()
hostname, found := nodeLabels[corev1.LabelHostname]
if !found {
Expand Down Expand Up @@ -91,9 +108,28 @@ func CreateLocalPV(
return fmt.Errorf("could not read the device's volume mode from the node: %w", err)
}

accessor, err := meta.Accessor(obj)
if err != nil {
return fmt.Errorf("could not get object metadata accessor from obj: %+v", obj)
}
name := accessor.GetName()
namespace := accessor.GetNamespace()
kind := obj.GetObjectKind().GroupVersionKind().Kind
if len(name) == 0 || len(namespace) == 0 || len(kind) == 0 {
return fmt.Errorf("name: %q, namespace: %q, or kind: %q is empty for obj: %+v", name, namespace, kind, obj)
}

deviceHandler := internal.NewDeviceLinkHandler(args.CurrentSymlink, client)

_, err = deviceHandler.Create(ctx, pvName, runtimeConfig.Namespace, obj)
Comment thread
gnufied marked this conversation as resolved.
if err != nil {
return fmt.Errorf("error creating localvolumedevicelink object: %w", err)
}
Comment on lines +122 to +127
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Create the LVDL only after the remaining PV validations pass.

Lines 124-127 persist the LocalVolumeDeviceLink before the last mountpoint/capacity checks and before the PV write. Any later return leaves an LVDL whose persistentVolumeName never exists as a PV, so retries can accumulate stale device-link objects.

Patch sketch
 	deviceHandler := internal.NewDeviceLinkHandler(args.CurrentSymlink, client)
-
-	_, err = deviceHandler.Create(ctx, pvName, runtimeConfig.Namespace, obj)
-	if err != nil {
-		return fmt.Errorf("error creating localvolumedevicelink object: %w", err)
-	}
 
 	// Do not attempt to create or update existing PV's that have been released
 	existingPV := &corev1.PersistentVolume{ObjectMeta: metav1.ObjectMeta{Name: pvName}}
 	err = client.Get(ctx, types.NamespacedName{Name: pvName}, existingPV)
@@
 	newPV := provCommon.CreateLocalPVSpec(localPVConfig)
 
 	// Add finalizer for diskmaker symlink cleanup
 	controllerutil.AddFinalizer(newPV, LSOSymlinkDeleterFinalizer)
+
+	_, err = deviceHandler.Create(ctx, pvName, runtimeConfig.Namespace, obj)
+	if err != nil {
+		return fmt.Errorf("error creating localvolumedevicelink object: %w", err)
+	}
 
 	klog.InfoS("creating PV", "pvName", pvName)

Also applies to: 129-165, 255-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/common/provisioner_utils.go` around lines 122 - 127, The code currently
calls deviceHandler.Create(...) to persist a LocalVolumeDeviceLink (LVDL) too
early (deviceHandler.Create, variable deviceHandler), before performing final
mountpoint/capacity validations and before writing the PersistentVolume (PV);
move the Create call so it executes only after all remaining PV validation
checks succeed and after the PV write completes to avoid leaving stale LVDL
objects referencing non-existent PVs. Locate every early Create usage (the shown
deviceHandler.Create call and the other similar Create calls later in the file
around the referenced blocks) and refactor so that: perform mountpoint and
capacity checks, write the PV, and only then call deviceHandler.Create(ctx,
pvName, runtimeConfig.Namespace, obj); ensure error handling remains intact and
that rollback/cleanup is considered if PV write fails.


// Do not attempt to create or update existing PV's that have been released
existingPV := &corev1.PersistentVolume{ObjectMeta: metav1.ObjectMeta{Name: pvName}}
err = client.Get(context.TODO(), types.NamespacedName{Name: pvName}, existingPV)
err = client.Get(ctx, types.NamespacedName{Name: pvName}, existingPV)

if err == nil && existingPV.Status.Phase == corev1.VolumeReleased {
klog.InfoS("PV is still being cleaned, not going to recreate it", "pvName", pvName, "disk", deviceName)
// Caller should try again soon
Expand Down Expand Up @@ -128,19 +164,6 @@ func CreateLocalPV(
return fmt.Errorf("path %q has unexpected volume type %q", symLinkPath, actualVolumeMode)
}

accessor, err := meta.Accessor(obj)
if err != nil {

return fmt.Errorf("could not get object metadata accessor from obj: %+v", obj)
}

name := accessor.GetName()
namespace := accessor.GetNamespace()
kind := obj.GetObjectKind().GroupVersionKind().Kind
if len(name) == 0 || len(namespace) == 0 || len(kind) == 0 {
return fmt.Errorf("name: %q, namespace: %q, or kind: %q is empty for obj: %+v", name, namespace, kind, obj)
}

labels := map[string]string{
corev1.LabelHostname: hostname,
PVOwnerKindLabel: kind,
Expand Down Expand Up @@ -229,8 +252,16 @@ func CreateLocalPV(
if opRes != controllerutil.OperationResultNone {
klog.InfoS("PV changed", "pvName", pvName, "status", opRes)
}
if err != nil {
return fmt.Errorf("error creating pv %s: %w", pvName, err)
}

_, err = deviceHandler.ApplyStatus(ctx, pvName, runtimeConfig.Namespace, args.BlockDevice, obj)
if err != nil {
return fmt.Errorf("error updating localvolumedevicelink object: %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be logical to do r.eventSync.Report(r.localVolume, newDiskEvent(diskmaker.ErrorCreatingLVDL, ...) before returning here. Is this a bug or you omitted such Report() intentionally here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event already gets reported up the chain from place where CreateLocalPV gets called. reporting another event here will simply cause duplicated events.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gnufied , sorry, I cannot find where it gets reported up the chain from place where CreateLocalPV gets called. Please give me file name and line of code.

I also made some debugging adding artificial error in the end of CreateLocalPV():

	// _, err = deviceHandler.ApplyStatus(ctx, pvName, runtimeConfig.Namespace, args.BlockDevice, obj)
	err = fmt.Errorf("ZZZZZ: MY ARTIFICIAL ERRROR: error updating localvolumedevicelink ZZZZZ")
	if err != nil {
		return fmt.Errorf("error updating localvolumedevicelink object: %w", err)
	}

The result is expected:

E0326 05:36:58.837106  469396 reconcile.go:486] "could not create local PV" err="error updating localvolumedevicelink object: ZZZZZ: MY ARTIFICIAL ERRROR: error updating localvolumedevicelink ZZZZZ" deviceNameLocation={"DiskNamePath":"/dev/loop1","UserProvidedPath":"/dev/loop1","DiskID":"","BlockDevice":{"name":"loop1","kname":"loop1","type":"loop","fsType":"","size":"1073741824","rota":"0","ro":"0","rm":"0"},"ForceWipe":false}

And LVDL object is crippled (no Status part at all):

$ oc get -n openshift-local-storage LocalVolumeDeviceLink local-pv-ff140cb4 -o yaml
apiVersion: local.storage.openshift.io/v1
kind: LocalVolumeDeviceLink
metadata:
  creationTimestamp: "2026-03-26T05:36:58Z"
  generation: 1
  name: local-pv-ff140cb4
  namespace: openshift-local-storage
  ownerReferences:
  - apiVersion: local.storage.openshift.io/v1
    controller: true
    kind: LocalVolume
    name: local-disks1
    uid: d0d4ce6a-e8b2-4f86-b3f3-f367fac84e1e
  resourceVersion: "360213"
  uid: 574c3ed3-302d-4ffa-a9f1-098d005a063d
spec:
  persistentVolumeName: local-pv-ff140cb4
  policy: None

But processRejectedDevicesForDeviceLinks() was never called and I cannot see any other places where we report an event for diskmaker.ErrorCreatingLVDL from LV (not LVS) code-paths. Please give me some details backing up your words about "reporting another event here will simply cause duplicated events".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processRejectedDevicesForDeviceLinks is only called for devices which were already mounted and consumed by previously created LSO PV. It won't be called for devices which has a backing PV, but it isn't used.

In - Reconcile function of lvset reconciler, where we call provisionPV:

		err = r.provisionPV(ctx, lvset, blockDevice, *storageClass, mountPointMap, symlinkSourcePath, symlinkPath, idExists)
		if err == common.ErrTryAgain {
			requeueTime = fastRequeueTime
		} else if err != nil {
			msg := fmt.Sprintf("provisioning failed for %s: %v", blockDevice.Name, err)
			r.eventReporter.Report(lvset, newDiskEvent(diskmaker.ErrorProvisioningDisk, msg, blockDevice.KName, corev1.EventTypeWarning))
			klog.Error(msg)
			continue
		}

This will log any errors coming from provisionPV which in turn calls CreateLocalPV to be reported as event.

It looks like for lv/reconciler.go though, we are simply logging the error and not regenerating an event. I could fix it, but a lot of those code has changed in #598 which actually implements device relinking.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

return err
return nil
}

// GeneratePVName is used to generate a PV name based on the filename, node, and storageclass
Expand Down
16 changes: 7 additions & 9 deletions pkg/controllers/localvolume/localvolume_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,15 +142,13 @@ func (r *LocalVolumeReconciler) syncLocalVolumeProvider(ctx context.Context, ins
return r.addFailureCondition(instance, o, err)
}

if diskMakerDS != nil {
children = append(children, operatorv1.GenerationStatus{
Group: appsv1.GroupName,
Resource: "DaemonSet",
Namespace: diskMakerDS.Namespace,
Name: diskMakerDS.Name,
LastGeneration: diskMakerDS.Generation,
})
}
children = append(children, operatorv1.GenerationStatus{
Group: appsv1.GroupName,
Resource: "DaemonSet",
Namespace: diskMakerDS.Namespace,
Name: diskMakerDS.Name,
LastGeneration: diskMakerDS.Generation,
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this change relevant to the goal of this PR? Assuming we get rid of this change, would anything become broken?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. I will fix

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed this in #598


o.Status.Generations = children
o.Status.State = operatorv1.Managed
Expand Down
Loading