Context
From PR #1121 review — @mihow suggested simplifying the logic that determines if a Job is in the FAILURE state:
I am thinking we should simplify the logic determining if a Job is in the FAILURE state. Let's just show the counts. Really we need a new state like "COMPLETED" instead of Celery's SUCCESS & FAILURE states. "Completed with errors". Then we can remove a number of checks related to the stage status & overall status.
Problem
Currently jobs use Celery's SUCCESS and FAILURE states, but real-world ML processing jobs often finish with some images failing (bad crops, missing files, timeouts) while the majority succeed. The current approach uses a failure ratio threshold to decide between SUCCESS and FAILURE, which requires threading a complete_state parameter through the progress stages and adds complexity.
Proposal
Add a COMPLETED (or COMPLETED_WITH_ERRORS) state to the Job status choices. A job that finishes processing all images would be COMPLETED regardless of individual failures. The UI would show the actual counts (processed, failed, detections, classifications) and let the user judge the outcome.
This would allow removing:
- The failure ratio threshold logic in
ami/jobs/tasks.py
- The
complete_state parameter threading through _update_job_progress
- Various checks related to per-stage status determining overall status
Related
Context
From PR #1121 review — @mihow suggested simplifying the logic that determines if a Job is in the FAILURE state:
Problem
Currently jobs use Celery's
SUCCESSandFAILUREstates, but real-world ML processing jobs often finish with some images failing (bad crops, missing files, timeouts) while the majority succeed. The current approach uses a failure ratio threshold to decide between SUCCESS and FAILURE, which requires threading acomplete_stateparameter through the progress stages and adds complexity.Proposal
Add a
COMPLETED(orCOMPLETED_WITH_ERRORS) state to the Job status choices. A job that finishes processing all images would beCOMPLETEDregardless of individual failures. The UI would show the actual counts (processed, failed, detections, classifications) and let the user judge the outcome.This would allow removing:
ami/jobs/tasks.pycomplete_stateparameter threading through_update_job_progressRelated
ami/jobs/tasks.py—_update_job_progress, failure ratio logicami/jobs/models.py— Job model, status field choices