Skip to content

Look into ways to help minimise rate limiting #262

@Varantha

Description

@Varantha

Description

I'm currently running the operator with only 7 OnePasswordItem CRs using a service account token, and I've been hitting a rate limit consistently. I am actively updating things in my cluster which could be triggering a higher level of redeploys but I'm confused that it's suggesting that I'm hitting 1000 reads per hour with so few secrets.

Possible Improvements

I tried to take a look at what could be causing so many API calls and thought perhaps these could be addressed to help lower the number of calls:

1. Reuse the item-path annotation

After the first successful fetch, the operator writes the operator.1password.io/item-path annotation on the Secret to vaults/<uuid>/items/<uuid>. On subsequent polls, getPathFromOnePasswordItem() ignores this annotation and reads the original path from the OnePasswordItem CR's .spec.itemPath. Then it needs to do the whole ListVaults -> ListSecrets -> GetItemByID flow (3 API calls where 1 would have been fine)
If this fails to find that ID then it can cycle back through the whole flow in case its referring to a new secret.

2. Cache API Calls

It would be good to cache API calls in some way. Especially on an initial run both ListVaults/ListSecrets won't differ much if it's loading 20 secrets for the first time. If they exist in the same vault it would drop the API calls to get all those secret's values from 60 to 22 (20 ListVaults, 20 ListSecrets, 20 GetItemByID -> 1 ListVaults, 1 ListSecrets -> 20 GetItemByID ) which is a large decrease. It could also be shared between the Poller and Controller to minimise overlapping calls there.

3. No Global Rate Limit Awareness

Similarly when being rate limited, from the logs it looks like it will try each secret call separately which all in turn get re-queued for 15 minutes later. Say we had 20 secrets and we are being rate limited, that's 20 failed calls now, 20 in 15 minutes.
Instead if there was a shared check to check isRateLimited centrally then it could try 1 call at the start, then 1 call after 15 minutes. If that one call passes, we can start to clear out the queue rather than retrying everything.

Workarounds

  • I understand that one suggestion for improvement is using the UUIDs instead of the fully named path as it appears in 1Password but I would say that ideally I would prefer to be able to tell the secret's name at a glance to make it easier to manage.
  • Increasing the polling interval would also help (which is what I'll try to do to fix the issue in the meantime)

Environment

  • Operator version: 1.12.0
  • Helm chart: connect-2.4.1
  • POLLING_INTERVAL: 600 (default)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions