The Relevancy Task is being a bad boy right now, routinely erroring with integrity errors. Here is the most recent one:
sqlalchemy.exc.IntegrityError: (sqlalchemy.dialects.postgresql.asyncpg.IntegrityError) <class 'asyncpg.exceptions.UniqueViolationError'>: duplicate key value violates unique constraint "url_task_error_pkey"
DETAIL: Key (url_id, task_type)=(4142, Relevancy) already exists.
[SQL: INSERT INTO url_task_error (task_type, error, url_id, task_id) VALUES ($1::task_type, $2::VARCHAR, $3::INTEGER, $4::INTEGER), ($5::task_type, $6::VARCHAR, $7::INTEGER, $8::INTEGER), ($9::task_type, $10::VARCHAR, $11::INTEGER, $12::INTEGER), ($13::tas ... 66696 characters truncated ... $4000::INTEGER) RETURNING url_task_error.created_at, url_task_error.url_id, url_task_error.task_type]
[parameters: ('Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 4142, 30049, 'Relevancy', 'Server disconnected', 4150, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 4171, 30049, 'Relevancy', 'Server disconnected', 4215, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 4238, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 4708, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 4729, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 4766, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 4792, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 7201, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 7202, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 7208, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'" ... 3900 parameters truncated ... 12519, 30049, 'Relevancy', 'Server disconnected', 12520, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 12521, 30049, 'Relevancy', 'Server disconnected', 12522, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 12523, 30049, 'Relevancy', 'Server disconnected', 12524, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 12525, 30049, 'Relevancy', 'Server disconnected', 12526, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 12527, 30049, 'Relevancy', 'Server disconnected', 12528, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 12529, 30049, 'Relevancy', 'Server disconnected', 12530, 30049, 'Relevancy', "400, message='Bad Request', url='https://erjp42mzm4k2tyn1.us-east-1.aws.endpoints.huggingface.cloud'", 12531, 30049)]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
The Relevancy Task is being a bad boy right now, routinely erroring with integrity errors. Here is the most recent one:
So there's a few things going on here:
For the moment, I've disabled this task, via the
URL_AUTO_RELEVANCE_TASK_FLAGenvironment variable. That should also help us zero in on whether it's the source of the memory leak. 1Footnotes
My suspicion is that it's a plausible candidate, as it's a third party library that we don't know the innards of, and might not be optimized for this sort of start-stop functionality. We can also look into whether upgrading the library would help. ↩