Skip to content

"Adopt" already held lock based on exact payload match? #16

@papandreou

Description

@papandreou

Hi!

I'm working on an app where a class of background jobs uses ioredis-lock to control access to a shared resource. There's one particular case where such a job can fail ungracefully due to the worker node being shut down, which means that it doesn't get to release the lock. The next thing that happens is that job queue detects that the job has stalled, and then proceeds to retry it, and so it fails to take the lock because the lock held by the old instance of the same job hasn't expired yet at that point.

I've come up with a hack that sets _key and _locked, then uses Lock#release to release the existing lock. This will only succeed when the job was taken by a previous instance of the same job, as determined by the job id in the payload. It looks roughly like this:

async function takeLock(redisKey, jobId, lockOptions) {
  const lock = ioredisLock.createLock(clientFactory(), lockOptions);
  // Use the job id as the payload of the lock so that we can re-acquire it.
  // Found the trick here:
  // https://github.com/makeomatic/ioredis-lock/issues/3#issuecomment-248290852
  lock._id = jobId;

  try {
    await lock.acquire(key);
  } catch (err) {
    // We failed to take the lock. This might be due to a previous instance of the same
    // job being classified as stalled and then being re-added to the waiting queue.
    // In that case we want to take over the existing lock.
    // Fool ioredis-lock into thinking that it's currently holding the lock, so we
    // can use the release method to break it iff the job id matches that of the
    // current lock payload:
    lock._locked = true;
    lock._key = redisKey;
    try {
      await lock.release();
    } catch (err) {
      // Failed to release the existing lock. This is most likely due to the payload not
      // matching, so we aren't battling a stalled job. Give up.
      return false;
    }
    try {
      await lock.acquire(redisKey);
      // Successfully re-acquired the lock from the stale job
    } catch (err) {
      // Failed to re-acquire the lock, assume that another job beat us to it and give up:
      return false;
    }
  }
  return lock; // Successfully took the lock
}

This should be free of race conditions, but I don't feel great about poking around in the internals like this. Would you consider "adoption" of an existing lock as an officially supported feature?

Best regards,
Andreas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions