Skip to content

feat: implement task pause/resume functionality#357

Open
soumojit-D48 wants to merge 1 commit intoGetBindu:mainfrom
soumojit-D48:feat/implement-task-pause-resume
Open

feat: implement task pause/resume functionality#357
soumojit-D48 wants to merge 1 commit intoGetBindu:mainfrom
soumojit-D48:feat/implement-task-pause-resume

Conversation

@soumojit-D48
Copy link
Copy Markdown

@soumojit-D48 soumojit-D48 commented Mar 13, 2026

Summary

  • Problem: Long-running AI agent tasks had no way to temporarily stop execution to free resources and resume later from where they left off
  • Why it matters: Users need ability to pause resource-intensive tasks without losing progress, then resume when ready
  • What changed: Implemented pause/resume handlers in worker base, added checkpoint save/restore, added suspended/resumed task states, implemented pause_task/resume_task in scheduler
  • What did NOT change: Task execution logic, storage interface, protocol types (except adding new states)

Change Type (select all that apply)

  • Feature
  • Bug fix
  • Refactor
  • Documentation
  • Security hardening
  • Tests
  • Chore/infra

Scope (select all touched areas)

  • Server / API endpoints
  • Extensions (DID, x402, etc.)
  • Storage backends
  • Scheduler backends
  • Observability / monitoring
  • Authentication / authorization
  • CLI / utilities
  • Tests
  • Documentation
  • CI/CD / infra

Linked Issue/PR

User-Visible / Behavior Changes

  • Tasks can now be paused (state: suspended) and resumed (state: resumed)
  • Checkpoint data is saved when pausing to preserve task context
  • Only working tasks can be paused, only suspended tasks can be resumed

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/credentials handling changed? (No)
  • New/changed network calls? (No)
  • Database schema/migration changes? (No)
  • Authentication/authorization changes? (No)
  • If any Yes, explain risk + mitigation: N/A

Verification

Environment

  • OS: Windows/macOS/Linux
  • Python version: 3.x
  • Storage backend: Any (base interface unchanged)
  • Scheduler backend: memory/redis

Steps to Test

  1. Start a long-running task
  2. Call pause_task with task_id
  3. Verify task state changes to "suspended"
  4. Call resume_task with task_id
  5. Verify task state changes to "resumed"

Expected Behavior

  • Paused task should save checkpoint and enter suspended state
  • Resumed task should restore checkpoint and enter resumed state

Actual Behavior

Evidence (attach at least one)

  • Failing test before + passing after
  • Test output / logs
  • Screenshot / recording
  • Performance metrics (if relevant)
image

Human Verification (required)

What you personally verified (not just CI):

  • Verified scenarios: Code review of implementation logic
  • Edge cases checked: Invalid state transitions handled (pause completed/canceled/failed tasks, resume non-suspended tasks)
  • What you did NOT verify: Runtime testing with actual task execution

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Database migration needed? (No)
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert commit
  • Files/config to restore: N/A
  • Known bad symptoms reviewers should watch for: Tasks stuck in suspended state

Risks and Mitigations

  • Risk: Task checkpoint data could grow large in storage
    • Mitigation: Only save essential metadata, not full task state
  • Risk: Resume could fail if checkpoint data is corrupted
    • Mitigation: Handle missing checkpoint gracefully, log warnings

Checklist

  • [ x] Tests pass (uv run pytest)
  • Pre-commit hooks pass (uv run pre-commit run --all-files)
  • Documentation updated (if needed)
  • Security impact assessed
  • Human verification completed
  • Backward compatibility considered

@soumojit-D48
Copy link
Copy Markdown
Author

hi @raahulrahl, Can You check this PR and Let me know if its helpful or not, Thanks!!

@Paraschamoli
Copy link
Copy Markdown
Member

Hey! @soumojit-D48 I tried testing the pause/resume feature locally. When I send a request with method: "tasks/pause", the server returns an error saying the method isn’t recognized.

It looks like the worker and scheduler base were updated, but I couldn’t find where tasks/pause and tasks/resume are exposed in the RPC/API layer. Because of that, I’m not able to trigger the pause operation through the API.

Am I missing something in the setup, or do those handlers still need to be added?

pkonal23 added a commit to pkonal23/Bindu that referenced this pull request Apr 16, 2026
## Summary

Implements the Task Pause/Resume feature that was marked as incomplete
in PR GetBindu#357. The implementation adds proper state management for pausing
and resuming long-running tasks.

## What Changed

### 1. Error Types (types.py)
- Added TaskNotPausableError (-32007)
- Added TaskNotResumableError (-32008)

### 2. Request/Response Types (types.py)
- Added PauseTaskRequest/PauseTaskResponse
- Added ResumeTaskRequest/ResumeTaskResponse
- CRITICAL: Added these to A2ARequest/A2AResponse discriminated unions

### 3. Settings (settings.py)
- Added tasks/pause and tasks/resume to method_handlers
- Added "suspended" and "resumed" to non_terminal_states

### 4. TaskManager (task_manager.py)
- Added pause_task() and resume_task() router methods

### 5. TaskHandlers (task_handlers.py)
- Implemented pause_task() with state validation (only "working" state)
- Implemented resume_task() with state validation (only "suspended" state)

### 6. Worker Handlers (workers/base.py)
- Implemented _handle_pause() - updates state to "suspended"
- Implemented _handle_resume() - updates state to "resumed" and re-queues task

## Testing

Created test script (test_pause_resume.py) and slow echo agent
(examples/beginner/slow_echo_agent.py) for testing.

### Critical Finding for Testing
The agent handler MUST use asyncio.sleep() instead of time.sleep():
- time.sleep() BLOCKS the event loop, preventing pause/resume
- asyncio.sleep() YIELDS control, allowing pause/resume to work

All 4 test cases pass:
✅ Pause working task → suspended
✅ Pause completed task → TaskNotPausableError
✅ Resume suspended task → resumed (re-queued)
✅ Resume working task → TaskNotResumableError

## Validation Rules

- Pause: only allowed in "working" state
- Resume: only allowed in "suspended" state

## API Usage

// Pause a task
{"method": "tasks/pause", "params": {"taskId": "uuid"}}

// Resume a task
{"method": "tasks/resume", "params": {"taskId": "uuid"}}

## Files Modified

- bindu/common/protocol/types.py
- bindu/settings.py
- bindu/server/task_manager.py
- bindu/server/handlers/task_handlers.py
- bindu/server/workers/base.py

## Related Issues

- Closes GetBindu#383 (the original bug report about unimplemented pause/resume)
- Related to GetBindu#356 (feature request) and GetBindu#357 (attempted implementation)

Co-Authored-By: Claude Opus 4.6 <noreply@openclaude.dev>
pkonal23 added a commit to pkonal23/Bindu that referenced this pull request Apr 16, 2026
## Summary

Implements the Task Pause/Resume feature that was marked as incomplete
in PR GetBindu#357. The implementation adds proper state management for pausing
and resuming long-running tasks.

## What Changed

### 1. Error Types (types.py)
- Added TaskNotPausableError (-32007)
- Added TaskNotResumableError (-32008)

### 2. Request/Response Types (types.py)
- Added PauseTaskRequest/PauseTaskResponse
- Added ResumeTaskRequest/ResumeTaskResponse
- CRITICAL: Added these to A2ARequest/A2AResponse discriminated unions

### 3. Settings (settings.py)
- Added tasks/pause and tasks/resume to method_handlers
- Added "suspended" and "resumed" to non_terminal_states

### 4. TaskManager (task_manager.py)
- Added pause_task() and resume_task() router methods

### 5. TaskHandlers (task_handlers.py)
- Implemented pause_task() with state validation (only "working" state)
- Implemented resume_task() with state validation (only "suspended" state)

### 6. Worker Handlers (workers/base.py)
- Implemented _handle_pause() - updates state to "suspended"
- Implemented _handle_resume() - updates state to "resumed" and re-queues task

## Testing

Created test script (test_pause_resume.py) and slow echo agent
(examples/beginner/slow_echo_agent.py) for testing.

### Critical Finding for Testing
The agent handler MUST use asyncio.sleep() instead of time.sleep():
- time.sleep() BLOCKS the event loop, preventing pause/resume
- asyncio.sleep() YIELDS control, allowing pause/resume to work

All 4 test cases pass:
✅ Pause working task → suspended
✅ Pause completed task → TaskNotPausableError
✅ Resume suspended task → resumed (re-queued)
✅ Resume working task → TaskNotResumableError

## Validation Rules

- Pause: only allowed in "working" state
- Resume: only allowed in "suspended" state

## API Usage

// Pause a task
{"method": "tasks/pause", "params": {"taskId": "uuid"}}

// Resume a task
{"method": "tasks/resume", "params": {"taskId": "uuid"}}

## Files Modified

- bindu/common/protocol/types.py
- bindu/settings.py
- bindu/server/task_manager.py
- bindu/server/handlers/task_handlers.py
- bindu/server/workers/base.py

## Related Issues

- Closes GetBindu#383 (the original bug report about unimplemented pause/resume)
- Related to GetBindu#356 (feature request) and GetBindu#357 (attempted implementation)

Co-Authored-By: Claude Opus 4.6 <noreply@openclaude.dev>
pkonal23 added a commit to pkonal23/Bindu that referenced this pull request Apr 16, 2026
## Summary

Implements the Task Pause/Resume feature that was marked as incomplete
in PR GetBindu#357. The implementation adds proper state management for pausing
and resuming long-running tasks.

## What Changed

### 1. Error Types (types.py)
- Added TaskNotPausableError (-32007)
- Added TaskNotResumableError (-32008)

### 2. Request/Response Types (types.py)
- Added PauseTaskRequest/PauseTaskResponse
- Added ResumeTaskRequest/ResumeTaskResponse
- CRITICAL: Added these to A2ARequest/A2AResponse discriminated unions

### 3. Settings (settings.py)
- Added tasks/pause and tasks/resume to method_handlers
- Added "suspended" and "resumed" to non_terminal_states

### 4. TaskManager (task_manager.py)
- Added pause_task() and resume_task() router methods

### 5. TaskHandlers (task_handlers.py)
- Implemented pause_task() with state validation (only "working" state)
- Implemented resume_task() with state validation (only "suspended" state)

### 6. Worker Handlers (workers/base.py)
- Implemented _handle_pause() - updates state to "suspended"
- Implemented _handle_resume() - updates state to "resumed" and re-queues task

## Testing

Created test script (test_pause_resume.py) and slow echo agent
(examples/beginner/slow_echo_agent.py) for testing.

### Critical Finding for Testing
The agent handler MUST use asyncio.sleep() instead of time.sleep():
- time.sleep() BLOCKS the event loop, preventing pause/resume
- asyncio.sleep() YIELDS control, allowing pause/resume to work

All 4 test cases pass:
✅ Pause working task → suspended
✅ Pause completed task → TaskNotPausableError
✅ Resume suspended task → resumed (re-queued)
✅ Resume working task → TaskNotResumableError

## Validation Rules

- Pause: only allowed in "working" state
- Resume: only allowed in "suspended" state

## API Usage

// Pause a task
{"method": "tasks/pause", "params": {"taskId": "uuid"}}

// Resume a task
{"method": "tasks/resume", "params": {"taskId": "uuid"}}

## Files Modified

- bindu/common/protocol/types.py
- bindu/settings.py
- bindu/server/task_manager.py
- bindu/server/handlers/task_handlers.py
- bindu/server/workers/base.py

## Related Issues

- Closes GetBindu#383 (the original bug report about unimplemented pause/resume)
- Related to GetBindu#356 (feature request) and GetBindu#357 (attempted implementation)

Co-Authored-By: Claude Opus 4.6 <noreply@openclaude.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Task Pause/Resume

2 participants