Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for exception properties and inner failure details to the FailureDetails class, allowing richer error information to be captured and propagated through the Durable Task framework. The PR also includes a protobuf file sync from the upstream repository, which introduces many additional protocol buffer changes beyond the core feature.
Changes:
- Added
propertiesfield toTaskFailureDetailsprotobuf message to store exception metadata - Added
innerFailurefield support to capture exception cause chains - Implemented bidirectional conversion between protobuf
Valuetypes and Java objects - Added comprehensive unit tests covering properties, inner failures, and round-trip serialization
- Updated
DurableTaskGrpcWorkerto use the newFailureDetailsconstructor for consistent exception handling - Synced protobuf definitions from upstream repository (commit hash 1caadbd7ecfdf5f2309acbeac28a3e36d16aa156)
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/durabletask-protobuf/protos/orchestrator_service.proto | Added properties field to TaskFailureDetails; synced many other protobuf changes from upstream |
| internal/durabletask-protobuf/PROTO_SOURCE_COMMIT_HASH | Updated to track new protobuf source commit |
| client/src/main/java/com/microsoft/durabletask/FailureDetails.java | Added properties and innerFailure fields with conversion methods; added toString() override |
| client/src/test/java/com/microsoft/durabletask/FailureDetailsTest.java | Comprehensive new test file covering all aspects of the new functionality |
| client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java | Refactored to use FailureDetails constructor instead of building proto directly |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| switch (value.getKindCase()) { | ||
| case NULL_VALUE: | ||
| return null; | ||
| case NUMBER_VALUE: | ||
| return value.getNumberValue(); | ||
| case STRING_VALUE: | ||
| return value.getStringValue(); | ||
| case BOOL_VALUE: | ||
| return value.getBoolValue(); | ||
| default: | ||
| return value.toString(); | ||
| } |
There was a problem hiding this comment.
The convertProtoValue method doesn't handle LIST_VALUE and STRUCT_VALUE cases from google.protobuf.Value. While these cases may not be used currently, the default case silently converts them to string using toString(), which could lead to unexpected behavior if these types are used in the future. Consider explicitly handling these cases or documenting why they're not supported.
| @@ -1 +1 @@ | |||
| fbe5bb20835678099fc51a44993ed9b045dee5a6 No newline at end of file | |||
| 1caadbd7ecfdf5f2309acbeac28a3e36d16aa156 No newline at end of file | |||
There was a problem hiding this comment.
The PR description contains placeholder text "resolves #issue_for_this_pr" instead of an actual issue number. The checklist also indicates that changes should be added to CHANGELOG.md, but no changes to CHANGELOG.md are included in this PR. Please update the PR description with the actual issue number and add an entry to the "Unreleased" section of CHANGELOG.md documenting this feature addition.
| @@ -95,6 +96,7 @@ message TaskScheduledEvent { | |||
| google.protobuf.StringValue version = 2; | |||
| google.protobuf.StringValue input = 3; | |||
| TraceContext parentTraceContext = 4; | |||
| map<string, string> tags = 5; | |||
| } | |||
|
|
|||
| message TaskCompletedEvent { | |||
| @@ -113,6 +115,7 @@ message SubOrchestrationInstanceCreatedEvent { | |||
| google.protobuf.StringValue version = 3; | |||
| google.protobuf.StringValue input = 4; | |||
| TraceContext parentTraceContext = 5; | |||
| map<string, string> tags = 6; | |||
| } | |||
|
|
|||
| message SubOrchestrationInstanceCompletedEvent { | |||
| @@ -192,7 +195,7 @@ message EntityOperationCalledEvent { | |||
| } | |||
|
|
|||
| message EntityLockRequestedEvent { | |||
| string criticalSectionId = 1; | |||
| string criticalSectionId = 1; | |||
| repeated string lockSet = 2; | |||
| int32 position = 3; | |||
| google.protobuf.StringValue parentInstanceId = 4; // used only within messages, null in histories | |||
| @@ -217,7 +220,19 @@ message EntityUnlockSentEvent { | |||
| message EntityLockGrantedEvent { | |||
| string criticalSectionId = 1; | |||
| } | |||
|
|
|||
|
|
|||
| message ExecutionRewoundEvent { | |||
| google.protobuf.StringValue reason = 1; | |||
| google.protobuf.StringValue parentExecutionId = 2; // used only for rewinding suborchestrations, null otherwise | |||
| google.protobuf.StringValue instanceId = 3; // used only for rewinding suborchestrations, null otherwise | |||
| TraceContext parentTraceContext = 4; // used only for rewinding suborchestrations, null otherwise | |||
| google.protobuf.StringValue name = 5; // used by DTS backend only | |||
| google.protobuf.StringValue version = 6; // used by DTS backend only | |||
| google.protobuf.StringValue input = 7; // used by DTS backend only | |||
| ParentInstanceInfo parentInstance = 8; // used by DTS backend only | |||
| map<string, string> tags = 9; // used by DTS backend only | |||
| } | |||
|
|
|||
| message HistoryEvent { | |||
| int32 eventId = 1; | |||
| google.protobuf.Timestamp timestamp = 2; | |||
| @@ -244,25 +259,30 @@ message HistoryEvent { | |||
| ExecutionResumedEvent executionResumed = 22; | |||
| EntityOperationSignaledEvent entityOperationSignaled = 23; | |||
| EntityOperationCalledEvent entityOperationCalled = 24; | |||
| EntityOperationCompletedEvent entityOperationCompleted = 25; | |||
| EntityOperationFailedEvent entityOperationFailed = 26; | |||
| EntityOperationCompletedEvent entityOperationCompleted = 25; | |||
| EntityOperationFailedEvent entityOperationFailed = 26; | |||
| EntityLockRequestedEvent entityLockRequested = 27; | |||
| EntityLockGrantedEvent entityLockGranted = 28; | |||
| EntityUnlockSentEvent entityUnlockSent = 29; | |||
| ExecutionRewoundEvent executionRewound = 30; | |||
| } | |||
| } | |||
|
|
|||
| message ScheduleTaskAction { | |||
| string name = 1; | |||
| google.protobuf.StringValue version = 2; | |||
| google.protobuf.StringValue input = 3; | |||
| map<string, string> tags = 4; | |||
| TraceContext parentTraceContext = 5; | |||
| } | |||
|
|
|||
| message CreateSubOrchestrationAction { | |||
| string instanceId = 1; | |||
| string name = 2; | |||
| google.protobuf.StringValue version = 3; | |||
| google.protobuf.StringValue input = 4; | |||
| TraceContext parentTraceContext = 5; | |||
| map<string, string> tags = 6; | |||
| } | |||
|
|
|||
| message CreateTimerAction { | |||
| @@ -282,6 +302,7 @@ message CompleteOrchestrationAction { | |||
| google.protobuf.StringValue newVersion = 4; | |||
| repeated HistoryEvent carryoverEvents = 5; | |||
| TaskFailureDetails failureDetails = 6; | |||
| map<string, string> tags = 7; | |||
| } | |||
|
|
|||
| message TerminateOrchestrationAction { | |||
| @@ -312,6 +333,11 @@ message OrchestratorAction { | |||
| } | |||
| } | |||
|
|
|||
| message OrchestrationTraceContext { | |||
| google.protobuf.StringValue spanID = 1; | |||
| google.protobuf.Timestamp spanStartTime = 2; | |||
| } | |||
|
|
|||
| message OrchestratorRequest { | |||
| string instanceId = 1; | |||
| google.protobuf.StringValue executionId = 2; | |||
| @@ -320,6 +346,8 @@ message OrchestratorRequest { | |||
| OrchestratorEntityParameters entityParameters = 5; | |||
| bool requiresHistoryStreaming = 6; | |||
| map<string, google.protobuf.Value> properties = 7; | |||
|
|
|||
| OrchestrationTraceContext orchestrationTraceContext = 8; | |||
| } | |||
|
|
|||
| message OrchestratorResponse { | |||
| @@ -331,6 +359,17 @@ message OrchestratorResponse { | |||
| // The number of work item events that were processed by the orchestrator. | |||
| // This field is optional. If not set, the service should assume that the orchestrator processed all events. | |||
| google.protobuf.Int32Value numEventsProcessed = 5; | |||
| OrchestrationTraceContext orchestrationTraceContext = 6; | |||
|
|
|||
| // Whether or not a history is required to complete the original OrchestratorRequest and none was provided. | |||
| bool requiresHistory = 7; | |||
|
|
|||
| // True if this is a partial (chunked) completion. The backend must keep the work item open until the final chunk (isPartial=false). | |||
| bool isPartial = 8; | |||
|
|
|||
| // Zero-based position of the current chunk within a chunked completion sequence. | |||
| // This field is omitted for non-chunked completions. | |||
| google.protobuf.Int32Value chunkIndex = 9; | |||
| } | |||
|
|
|||
| message CreateInstanceRequest { | |||
| @@ -343,6 +382,7 @@ message CreateInstanceRequest { | |||
| google.protobuf.StringValue executionId = 7; | |||
| map<string, string> tags = 8; | |||
| TraceContext parentTraceContext = 9; | |||
| google.protobuf.Timestamp requestTime = 10; | |||
| } | |||
|
|
|||
| message OrchestrationIdReusePolicy { | |||
| @@ -449,12 +489,28 @@ message QueryInstancesResponse { | |||
| google.protobuf.StringValue continuationToken = 2; | |||
| } | |||
|
|
|||
| message ListInstanceIdsRequest { | |||
| repeated OrchestrationStatus runtimeStatus = 1; | |||
| google.protobuf.Timestamp completedTimeFrom = 2; | |||
| google.protobuf.Timestamp completedTimeTo = 3; | |||
| int32 pageSize = 4; | |||
| google.protobuf.StringValue lastInstanceKey = 5; | |||
| } | |||
|
|
|||
| message ListInstanceIdsResponse { | |||
| repeated string instanceIds = 1; | |||
| google.protobuf.StringValue lastInstanceKey = 2; | |||
| } | |||
|
|
|||
| message PurgeInstancesRequest { | |||
| oneof request { | |||
| string instanceId = 1; | |||
| PurgeInstanceFilter purgeInstanceFilter = 2; | |||
| InstanceBatch instanceBatch = 4; | |||
| } | |||
| bool recursive = 3; | |||
| // used in the case when an instanceId is specified to determine if the purge request is for an orchestration (as opposed to an entity) | |||
| bool isOrchestration = 5; | |||
| } | |||
|
|
|||
| message PurgeInstanceFilter { | |||
| @@ -468,6 +524,15 @@ message PurgeInstancesResponse { | |||
| google.protobuf.BoolValue isComplete = 2; | |||
| } | |||
|
|
|||
| message RestartInstanceRequest { | |||
| string instanceId = 1; | |||
| bool restartWithNewInstanceId = 2; | |||
| } | |||
|
|
|||
| message RestartInstanceResponse { | |||
| string instanceId = 1; | |||
| } | |||
|
|
|||
| message CreateTaskHubRequest { | |||
| bool recreateIfExists = 1; | |||
| } | |||
| @@ -490,10 +555,12 @@ message SignalEntityRequest { | |||
| google.protobuf.StringValue input = 3; | |||
| string requestId = 4; | |||
| google.protobuf.Timestamp scheduledTime = 5; | |||
| TraceContext parentTraceContext = 6; | |||
| google.protobuf.Timestamp requestTime = 7; | |||
| } | |||
|
|
|||
| message SignalEntityResponse { | |||
| // no payload | |||
| // no payload | |||
| } | |||
|
|
|||
| message GetEntityRequest { | |||
| @@ -553,6 +620,7 @@ message EntityBatchRequest { | |||
| string instanceId = 1; | |||
| google.protobuf.StringValue entityState = 2; | |||
| repeated OperationRequest operations = 3; | |||
| map<string, google.protobuf.Value> properties = 4; | |||
| } | |||
|
|
|||
| message EntityBatchResult { | |||
| @@ -562,6 +630,8 @@ message EntityBatchResult { | |||
| TaskFailureDetails failureDetails = 4; | |||
| string completionToken = 5; | |||
| repeated OperationInfo operationInfos = 6; // used only with DTS | |||
| // Whether or not an entity state is required to complete the original EntityBatchRequest and none was provided. | |||
| bool requiresState = 7; | |||
| } | |||
|
|
|||
| message EntityRequest { | |||
| @@ -575,6 +645,7 @@ message OperationRequest { | |||
| string operation = 1; | |||
| string requestId = 2; | |||
| google.protobuf.StringValue input = 3; | |||
| TraceContext traceContext = 4; | |||
| } | |||
|
|
|||
| message OperationResult { | |||
| @@ -591,10 +662,14 @@ message OperationInfo { | |||
|
|
|||
| message OperationResultSuccess { | |||
| google.protobuf.StringValue result = 1; | |||
| google.protobuf.Timestamp startTimeUtc = 2; | |||
| google.protobuf.Timestamp endTimeUtc = 3; | |||
| } | |||
|
|
|||
| message OperationResultFailure { | |||
| TaskFailureDetails failureDetails = 1; | |||
| google.protobuf.Timestamp startTimeUtc = 2; | |||
| google.protobuf.Timestamp endTimeUtc = 3; | |||
| } | |||
|
|
|||
| message OperationAction { | |||
| @@ -610,6 +685,8 @@ message SendSignalAction { | |||
| string name = 2; | |||
| google.protobuf.StringValue input = 3; | |||
| google.protobuf.Timestamp scheduledTime = 4; | |||
| google.protobuf.Timestamp requestTime = 5; | |||
| TraceContext parentTraceContext = 6; | |||
| } | |||
|
|
|||
| message StartNewOrchestrationAction { | |||
| @@ -618,6 +695,8 @@ message StartNewOrchestrationAction { | |||
| google.protobuf.StringValue version = 3; | |||
| google.protobuf.StringValue input = 4; | |||
| google.protobuf.Timestamp scheduledTime = 5; | |||
| google.protobuf.Timestamp requestTime = 6; | |||
| TraceContext parentTraceContext = 7; | |||
| } | |||
|
|
|||
| message AbandonActivityTaskRequest { | |||
| @@ -644,6 +723,17 @@ message AbandonEntityTaskResponse { | |||
| // Empty. | |||
| } | |||
|
|
|||
| message SkipGracefulOrchestrationTerminationsRequest { | |||
| InstanceBatch instanceBatch = 1; | |||
| google.protobuf.StringValue reason = 2; | |||
| } | |||
|
|
|||
| message SkipGracefulOrchestrationTerminationsResponse { | |||
| // Those instances which could not be terminated because they had locked entities at the time of this termination call, | |||
| // are already in a terminal state (completed, failed, terminated, etc.), are not orchestrations, or do not exist (i.e. have been purged) | |||
| repeated string unterminatedInstanceIds = 1; | |||
| } | |||
|
|
|||
| service TaskHubSidecarService { | |||
| // Sends a hello request to the sidecar service. | |||
| rpc Hello(google.protobuf.Empty) returns (google.protobuf.Empty); | |||
| @@ -657,18 +747,21 @@ service TaskHubSidecarService { | |||
| // Rewinds an orchestration instance to last known good state and replays from there. | |||
| rpc RewindInstance(RewindInstanceRequest) returns (RewindInstanceResponse); | |||
|
|
|||
| // Restarts an orchestration instance. | |||
| rpc RestartInstance(RestartInstanceRequest) returns (RestartInstanceResponse); | |||
|
|
|||
| // Waits for an orchestration instance to reach a running or completion state. | |||
| rpc WaitForInstanceStart(GetInstanceRequest) returns (GetInstanceResponse); | |||
|
|
|||
| // Waits for an orchestration instance to reach a completion state (completed, failed, terminated, etc.). | |||
| rpc WaitForInstanceCompletion(GetInstanceRequest) returns (GetInstanceResponse); | |||
|
|
|||
| // Raises an event to a running orchestration instance. | |||
| rpc RaiseEvent(RaiseEventRequest) returns (RaiseEventResponse); | |||
|
|
|||
| // Terminates a running orchestration instance. | |||
| rpc TerminateInstance(TerminateRequest) returns (TerminateResponse); | |||
|
|
|||
| // Suspends a running orchestration instance. | |||
| rpc SuspendInstance(SuspendRequest) returns (SuspendResponse); | |||
|
|
|||
| @@ -678,6 +771,9 @@ service TaskHubSidecarService { | |||
| // rpc DeleteInstance(DeleteInstanceRequest) returns (DeleteInstanceResponse); | |||
|
|
|||
| rpc QueryInstances(QueryInstancesRequest) returns (QueryInstancesResponse); | |||
|
|
|||
| rpc ListInstanceIds(ListInstanceIdsRequest) returns (ListInstanceIdsResponse); | |||
|
|
|||
| rpc PurgeInstances(PurgeInstancesRequest) returns (PurgeInstancesResponse); | |||
|
|
|||
| rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem); | |||
| @@ -714,6 +810,10 @@ service TaskHubSidecarService { | |||
|
|
|||
| // Abandon an entity work item | |||
| rpc AbandonTaskEntityWorkItem(AbandonEntityTaskRequest) returns (AbandonEntityTaskResponse); | |||
|
|
|||
| // "Skip" graceful termination of orchestrations by immediately changing their status in storage to "terminated". | |||
| // Note that a maximum of 500 orchestrations can be terminated at a time using this method. | |||
| rpc SkipGracefulOrchestrationTerminations(SkipGracefulOrchestrationTerminationsRequest) returns (SkipGracefulOrchestrationTerminationsResponse); | |||
| } | |||
|
|
|||
| message GetWorkItemsRequest { | |||
| @@ -722,6 +822,7 @@ message GetWorkItemsRequest { | |||
| int32 maxConcurrentEntityWorkItems = 3; | |||
|
|
|||
| repeated WorkerCapability capabilities = 10; | |||
| WorkItemFilters workItemFilters = 11; | |||
| } | |||
|
|
|||
| enum WorkerCapability { | |||
| @@ -732,6 +833,36 @@ enum WorkerCapability { | |||
| // When set, the service may return work items without any history events as an optimization. | |||
| // It is strongly recommended that all SDKs support this capability. | |||
| WORKER_CAPABILITY_HISTORY_STREAMING = 1; | |||
|
|
|||
| // Indicates that the worker supports scheduled tasks. | |||
| // The service may send schedule-triggered orchestration work items, | |||
| // and the worker must handle them, including the scheduledTime field. | |||
| WORKER_CAPABILITY_SCHEDULED_TASKS = 2; | |||
|
|
|||
| // Signals that the worker can handle large payloads stored externally (e.g., Blob Storage). | |||
| // Work items may contain URI references instead of inline data, and the worker must fetch them. | |||
| // This avoids message size limits and reduces network overhead. | |||
| WORKER_CAPABILITY_LARGE_PAYLOADS = 3; | |||
| } | |||
|
|
|||
| message WorkItemFilters { | |||
| repeated OrchestrationFilter orchestrations = 1; | |||
| repeated ActivityFilter activities = 2; | |||
| repeated EntityFilter entities = 3; | |||
| } | |||
|
|
|||
| message OrchestrationFilter { | |||
| string name = 1; | |||
| repeated string versions = 2; | |||
| } | |||
|
|
|||
| message ActivityFilter { | |||
| string name = 1; | |||
| repeated string versions = 2; | |||
| } | |||
|
|
|||
| message EntityFilter { | |||
| string name = 1; | |||
| } | |||
|
|
|||
| message WorkItem { | |||
| @@ -750,7 +881,7 @@ message CompleteTaskResponse { | |||
| } | |||
|
|
|||
| message HealthPing { | |||
| // No payload | |||
| // No payload | |||
| } | |||
|
|
|||
| message StreamInstanceHistoryRequest { | |||
| @@ -764,3 +895,8 @@ message StreamInstanceHistoryRequest { | |||
| message HistoryChunk { | |||
| repeated HistoryEvent events = 1; | |||
| } | |||
|
|
|||
| message InstanceBatch { | |||
| // A maximum of 500 instance IDs can be provided in this list. | |||
| repeated string instanceIds = 1; | |||
| } | |||
There was a problem hiding this comment.
The protobuf file contains numerous changes unrelated to the PR title "Include Exception Properties at FailureDetails", including new messages (ExecutionRewoundEvent, OrchestrationTraceContext, ListInstanceIdsRequest/Response, RestartInstanceRequest/Response, SkipGracefulOrchestrationTerminationsRequest/Response, InstanceBatch, WorkItemFilters, OrchestrationFilter, ActivityFilter, EntityFilter), new fields in multiple existing messages, new RPC methods, and new worker capabilities. This appears to be a protobuf sync from an upstream repository rather than changes specific to this PR. While the PROTO_SOURCE_COMMIT_HASH file update confirms this is intentional, the PR title and description should clarify that this is a protobuf sync that happens to include the FailureDetails.properties field, or these should be separate PRs to keep changes focused and reviewable.
| public boolean isCausedBy(Class<? extends Exception> exceptionClass) { | ||
| String actualClassName = this.getErrorType(); | ||
| try { | ||
| // Try using reflection to load the failure's class type and see if it's a subtype of the specified | ||
| // exception. For example, this should always succeed if exceptionClass is System.Exception. | ||
| Class<?> actualExceptionClass = Class.forName(actualClassName); | ||
| return exceptionClass.isAssignableFrom(actualExceptionClass); | ||
| } catch (ClassNotFoundException ex) { | ||
| // Can't load the class and thus can't tell if it's related | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
The new innerFailure field is not tested with the isCausedBy method. It's unclear whether isCausedBy should only check the current failure or should traverse the innerFailure chain (similar to how one might need to check both an exception and its causes). Consider adding tests to clarify and document this behavior, or update the isCausedBy method to traverse the chain if that's the intended behavior.
Issue describing the changes in this PR
resolves #issue_for_this_pr
Pull request checklist
CHANGELOG.mdAdditional information
Additional PR information