Skip to content

[Bug]: CrossCameraReID._fetch_candidates() uses Redis KEYS - blocks server under multi-camera load #95

@Ryzen-Starbit

Description

@Ryzen-Starbit

Affected Component

Tracking (ByteTrack / DeepSORT — services/tracking/)

Bug Description

While going through cross_camera_reid.py I noticed that _fetch_candidates() at line 231 uses self._r.keys("embed:*") to retrieve stored embeddings.
The code itself has a comment stating # KEYS is fine for small deployments; use SCAN for production scale

Redis KEYS is a blocking O(N) command that locks the entire Redis server while it scans all keys. In this project processing multiple camera feeds, this means every BORN event blocks all other Redis operations - memory writes, track events, Kafka producer for the full duration of the scan. The freeze gets worse as more identities are tracked since more embed: keys exist.

Steps to Reproduce

  1. Open services/tracking/cross_camera_reid.py
  2. Find line 231: all_keys = self._r.keys(pattern)
  3. Note the comment directly above: "# KEYS is fine for small deployments; use SCAN for production scale"
  4. Under multi-camera load, every BORN event triggers a full blocking scan of the entire embed:* keyspace

Expected Behavior

_fetch_candidates() should use Redis SCAN (cursor-based, non-blocking) so the server stays responsive during embedding retrieval across multiple simultaneous camera feeds.

Actual Behavior

self._r.keys("embed:*") performs a blocking O(N) scan on every BORN event. This blocks all concurrent Redis operations for the scan duration - gets worse as tracked identity count grows.

Python Version

3.13

Operating System

Windows 11

Inference Device

None

Error Log / Traceback

Screenshots or Recordings

No response

Additional Context

Proposed fix - replace KEYS with SCAN in _fetch_candidates():

Current code (line 231): all_keys = self._r.keys(pattern)

Fix: all_keys = []
cursor = 0
while True:
cursor, batch = self._r.scan(cursor, match=pattern, count=100)
all_keys.extend(batch)
if cursor == 0:
break

SCAN is cursor-based and non-blocking, Redis handles other operations between each batch. count=100 balances throughput vs latency per iteration.
This is a known TODO in the code itself.

Files affected:

  • services/tracking/cross_camera_reid.py (_fetch_candidates method, line 231)
    I'd like to open a PR for this fix if the maintainer agrees.

Checklist

  • I have searched existing issues and this is not a duplicate.
  • I have tested with the latest version of the main branch.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions