Skip to content

Comments

Include Exception Properties at FailureDetails#263

Open
nytian wants to merge 2 commits intomainfrom
nytian/failuredetails
Open

Include Exception Properties at FailureDetails#263
nytian wants to merge 2 commits intomainfrom
nytian/failuredetails

Conversation

@nytian
Copy link

@nytian nytian commented Feb 23, 2026

Issue describing the changes in this PR

resolves #issue_for_this_pr

Pull request checklist

  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes are added to the CHANGELOG.md
  • I have added all required tests (Unit tests, E2E tests)

Additional information

Additional PR information

@nytian nytian requested a review from a team as a code owner February 23, 2026 18:08
Copilot AI review requested due to automatic review settings February 23, 2026 18:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for exception properties and inner failure details to the FailureDetails class, allowing richer error information to be captured and propagated through the Durable Task framework. The PR also includes a protobuf file sync from the upstream repository, which introduces many additional protocol buffer changes beyond the core feature.

Changes:

  • Added properties field to TaskFailureDetails protobuf message to store exception metadata
  • Added innerFailure field support to capture exception cause chains
  • Implemented bidirectional conversion between protobuf Value types and Java objects
  • Added comprehensive unit tests covering properties, inner failures, and round-trip serialization
  • Updated DurableTaskGrpcWorker to use the new FailureDetails constructor for consistent exception handling
  • Synced protobuf definitions from upstream repository (commit hash 1caadbd7ecfdf5f2309acbeac28a3e36d16aa156)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/durabletask-protobuf/protos/orchestrator_service.proto Added properties field to TaskFailureDetails; synced many other protobuf changes from upstream
internal/durabletask-protobuf/PROTO_SOURCE_COMMIT_HASH Updated to track new protobuf source commit
client/src/main/java/com/microsoft/durabletask/FailureDetails.java Added properties and innerFailure fields with conversion methods; added toString() override
client/src/test/java/com/microsoft/durabletask/FailureDetailsTest.java Comprehensive new test file covering all aspects of the new functionality
client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java Refactored to use FailureDetails constructor instead of building proto directly

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +229 to +240
switch (value.getKindCase()) {
case NULL_VALUE:
return null;
case NUMBER_VALUE:
return value.getNumberValue();
case STRING_VALUE:
return value.getStringValue();
case BOOL_VALUE:
return value.getBoolValue();
default:
return value.toString();
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convertProtoValue method doesn't handle LIST_VALUE and STRUCT_VALUE cases from google.protobuf.Value. While these cases may not be used currently, the default case silently converts them to string using toString(), which could lead to unexpected behavior if these types are used in the future. Consider explicitly handling these cases or documenting why they're not supported.

Copilot uses AI. Check for mistakes.
@@ -1 +1 @@
fbe5bb20835678099fc51a44993ed9b045dee5a6 No newline at end of file
1caadbd7ecfdf5f2309acbeac28a3e36d16aa156 No newline at end of file
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description contains placeholder text "resolves #issue_for_this_pr" instead of an actual issue number. The checklist also indicates that changes should be added to CHANGELOG.md, but no changes to CHANGELOG.md are included in this PR. Please update the PR description with the actual issue number and add an entry to the "Unreleased" section of CHANGELOG.md documenting this feature addition.

Copilot uses AI. Check for mistakes.
Comment on lines 44 to 902
@@ -95,6 +96,7 @@ message TaskScheduledEvent {
google.protobuf.StringValue version = 2;
google.protobuf.StringValue input = 3;
TraceContext parentTraceContext = 4;
map<string, string> tags = 5;
}

message TaskCompletedEvent {
@@ -113,6 +115,7 @@ message SubOrchestrationInstanceCreatedEvent {
google.protobuf.StringValue version = 3;
google.protobuf.StringValue input = 4;
TraceContext parentTraceContext = 5;
map<string, string> tags = 6;
}

message SubOrchestrationInstanceCompletedEvent {
@@ -192,7 +195,7 @@ message EntityOperationCalledEvent {
}

message EntityLockRequestedEvent {
string criticalSectionId = 1;
string criticalSectionId = 1;
repeated string lockSet = 2;
int32 position = 3;
google.protobuf.StringValue parentInstanceId = 4; // used only within messages, null in histories
@@ -217,7 +220,19 @@ message EntityUnlockSentEvent {
message EntityLockGrantedEvent {
string criticalSectionId = 1;
}


message ExecutionRewoundEvent {
google.protobuf.StringValue reason = 1;
google.protobuf.StringValue parentExecutionId = 2; // used only for rewinding suborchestrations, null otherwise
google.protobuf.StringValue instanceId = 3; // used only for rewinding suborchestrations, null otherwise
TraceContext parentTraceContext = 4; // used only for rewinding suborchestrations, null otherwise
google.protobuf.StringValue name = 5; // used by DTS backend only
google.protobuf.StringValue version = 6; // used by DTS backend only
google.protobuf.StringValue input = 7; // used by DTS backend only
ParentInstanceInfo parentInstance = 8; // used by DTS backend only
map<string, string> tags = 9; // used by DTS backend only
}

message HistoryEvent {
int32 eventId = 1;
google.protobuf.Timestamp timestamp = 2;
@@ -244,25 +259,30 @@ message HistoryEvent {
ExecutionResumedEvent executionResumed = 22;
EntityOperationSignaledEvent entityOperationSignaled = 23;
EntityOperationCalledEvent entityOperationCalled = 24;
EntityOperationCompletedEvent entityOperationCompleted = 25;
EntityOperationFailedEvent entityOperationFailed = 26;
EntityOperationCompletedEvent entityOperationCompleted = 25;
EntityOperationFailedEvent entityOperationFailed = 26;
EntityLockRequestedEvent entityLockRequested = 27;
EntityLockGrantedEvent entityLockGranted = 28;
EntityUnlockSentEvent entityUnlockSent = 29;
ExecutionRewoundEvent executionRewound = 30;
}
}

message ScheduleTaskAction {
string name = 1;
google.protobuf.StringValue version = 2;
google.protobuf.StringValue input = 3;
map<string, string> tags = 4;
TraceContext parentTraceContext = 5;
}

message CreateSubOrchestrationAction {
string instanceId = 1;
string name = 2;
google.protobuf.StringValue version = 3;
google.protobuf.StringValue input = 4;
TraceContext parentTraceContext = 5;
map<string, string> tags = 6;
}

message CreateTimerAction {
@@ -282,6 +302,7 @@ message CompleteOrchestrationAction {
google.protobuf.StringValue newVersion = 4;
repeated HistoryEvent carryoverEvents = 5;
TaskFailureDetails failureDetails = 6;
map<string, string> tags = 7;
}

message TerminateOrchestrationAction {
@@ -312,6 +333,11 @@ message OrchestratorAction {
}
}

message OrchestrationTraceContext {
google.protobuf.StringValue spanID = 1;
google.protobuf.Timestamp spanStartTime = 2;
}

message OrchestratorRequest {
string instanceId = 1;
google.protobuf.StringValue executionId = 2;
@@ -320,6 +346,8 @@ message OrchestratorRequest {
OrchestratorEntityParameters entityParameters = 5;
bool requiresHistoryStreaming = 6;
map<string, google.protobuf.Value> properties = 7;

OrchestrationTraceContext orchestrationTraceContext = 8;
}

message OrchestratorResponse {
@@ -331,6 +359,17 @@ message OrchestratorResponse {
// The number of work item events that were processed by the orchestrator.
// This field is optional. If not set, the service should assume that the orchestrator processed all events.
google.protobuf.Int32Value numEventsProcessed = 5;
OrchestrationTraceContext orchestrationTraceContext = 6;

// Whether or not a history is required to complete the original OrchestratorRequest and none was provided.
bool requiresHistory = 7;

// True if this is a partial (chunked) completion. The backend must keep the work item open until the final chunk (isPartial=false).
bool isPartial = 8;

// Zero-based position of the current chunk within a chunked completion sequence.
// This field is omitted for non-chunked completions.
google.protobuf.Int32Value chunkIndex = 9;
}

message CreateInstanceRequest {
@@ -343,6 +382,7 @@ message CreateInstanceRequest {
google.protobuf.StringValue executionId = 7;
map<string, string> tags = 8;
TraceContext parentTraceContext = 9;
google.protobuf.Timestamp requestTime = 10;
}

message OrchestrationIdReusePolicy {
@@ -449,12 +489,28 @@ message QueryInstancesResponse {
google.protobuf.StringValue continuationToken = 2;
}

message ListInstanceIdsRequest {
repeated OrchestrationStatus runtimeStatus = 1;
google.protobuf.Timestamp completedTimeFrom = 2;
google.protobuf.Timestamp completedTimeTo = 3;
int32 pageSize = 4;
google.protobuf.StringValue lastInstanceKey = 5;
}

message ListInstanceIdsResponse {
repeated string instanceIds = 1;
google.protobuf.StringValue lastInstanceKey = 2;
}

message PurgeInstancesRequest {
oneof request {
string instanceId = 1;
PurgeInstanceFilter purgeInstanceFilter = 2;
InstanceBatch instanceBatch = 4;
}
bool recursive = 3;
// used in the case when an instanceId is specified to determine if the purge request is for an orchestration (as opposed to an entity)
bool isOrchestration = 5;
}

message PurgeInstanceFilter {
@@ -468,6 +524,15 @@ message PurgeInstancesResponse {
google.protobuf.BoolValue isComplete = 2;
}

message RestartInstanceRequest {
string instanceId = 1;
bool restartWithNewInstanceId = 2;
}

message RestartInstanceResponse {
string instanceId = 1;
}

message CreateTaskHubRequest {
bool recreateIfExists = 1;
}
@@ -490,10 +555,12 @@ message SignalEntityRequest {
google.protobuf.StringValue input = 3;
string requestId = 4;
google.protobuf.Timestamp scheduledTime = 5;
TraceContext parentTraceContext = 6;
google.protobuf.Timestamp requestTime = 7;
}

message SignalEntityResponse {
// no payload
// no payload
}

message GetEntityRequest {
@@ -553,6 +620,7 @@ message EntityBatchRequest {
string instanceId = 1;
google.protobuf.StringValue entityState = 2;
repeated OperationRequest operations = 3;
map<string, google.protobuf.Value> properties = 4;
}

message EntityBatchResult {
@@ -562,6 +630,8 @@ message EntityBatchResult {
TaskFailureDetails failureDetails = 4;
string completionToken = 5;
repeated OperationInfo operationInfos = 6; // used only with DTS
// Whether or not an entity state is required to complete the original EntityBatchRequest and none was provided.
bool requiresState = 7;
}

message EntityRequest {
@@ -575,6 +645,7 @@ message OperationRequest {
string operation = 1;
string requestId = 2;
google.protobuf.StringValue input = 3;
TraceContext traceContext = 4;
}

message OperationResult {
@@ -591,10 +662,14 @@ message OperationInfo {

message OperationResultSuccess {
google.protobuf.StringValue result = 1;
google.protobuf.Timestamp startTimeUtc = 2;
google.protobuf.Timestamp endTimeUtc = 3;
}

message OperationResultFailure {
TaskFailureDetails failureDetails = 1;
google.protobuf.Timestamp startTimeUtc = 2;
google.protobuf.Timestamp endTimeUtc = 3;
}

message OperationAction {
@@ -610,6 +685,8 @@ message SendSignalAction {
string name = 2;
google.protobuf.StringValue input = 3;
google.protobuf.Timestamp scheduledTime = 4;
google.protobuf.Timestamp requestTime = 5;
TraceContext parentTraceContext = 6;
}

message StartNewOrchestrationAction {
@@ -618,6 +695,8 @@ message StartNewOrchestrationAction {
google.protobuf.StringValue version = 3;
google.protobuf.StringValue input = 4;
google.protobuf.Timestamp scheduledTime = 5;
google.protobuf.Timestamp requestTime = 6;
TraceContext parentTraceContext = 7;
}

message AbandonActivityTaskRequest {
@@ -644,6 +723,17 @@ message AbandonEntityTaskResponse {
// Empty.
}

message SkipGracefulOrchestrationTerminationsRequest {
InstanceBatch instanceBatch = 1;
google.protobuf.StringValue reason = 2;
}

message SkipGracefulOrchestrationTerminationsResponse {
// Those instances which could not be terminated because they had locked entities at the time of this termination call,
// are already in a terminal state (completed, failed, terminated, etc.), are not orchestrations, or do not exist (i.e. have been purged)
repeated string unterminatedInstanceIds = 1;
}

service TaskHubSidecarService {
// Sends a hello request to the sidecar service.
rpc Hello(google.protobuf.Empty) returns (google.protobuf.Empty);
@@ -657,18 +747,21 @@ service TaskHubSidecarService {
// Rewinds an orchestration instance to last known good state and replays from there.
rpc RewindInstance(RewindInstanceRequest) returns (RewindInstanceResponse);

// Restarts an orchestration instance.
rpc RestartInstance(RestartInstanceRequest) returns (RestartInstanceResponse);

// Waits for an orchestration instance to reach a running or completion state.
rpc WaitForInstanceStart(GetInstanceRequest) returns (GetInstanceResponse);

// Waits for an orchestration instance to reach a completion state (completed, failed, terminated, etc.).
rpc WaitForInstanceCompletion(GetInstanceRequest) returns (GetInstanceResponse);

// Raises an event to a running orchestration instance.
rpc RaiseEvent(RaiseEventRequest) returns (RaiseEventResponse);

// Terminates a running orchestration instance.
rpc TerminateInstance(TerminateRequest) returns (TerminateResponse);

// Suspends a running orchestration instance.
rpc SuspendInstance(SuspendRequest) returns (SuspendResponse);

@@ -678,6 +771,9 @@ service TaskHubSidecarService {
// rpc DeleteInstance(DeleteInstanceRequest) returns (DeleteInstanceResponse);

rpc QueryInstances(QueryInstancesRequest) returns (QueryInstancesResponse);

rpc ListInstanceIds(ListInstanceIdsRequest) returns (ListInstanceIdsResponse);

rpc PurgeInstances(PurgeInstancesRequest) returns (PurgeInstancesResponse);

rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem);
@@ -714,6 +810,10 @@ service TaskHubSidecarService {

// Abandon an entity work item
rpc AbandonTaskEntityWorkItem(AbandonEntityTaskRequest) returns (AbandonEntityTaskResponse);

// "Skip" graceful termination of orchestrations by immediately changing their status in storage to "terminated".
// Note that a maximum of 500 orchestrations can be terminated at a time using this method.
rpc SkipGracefulOrchestrationTerminations(SkipGracefulOrchestrationTerminationsRequest) returns (SkipGracefulOrchestrationTerminationsResponse);
}

message GetWorkItemsRequest {
@@ -722,6 +822,7 @@ message GetWorkItemsRequest {
int32 maxConcurrentEntityWorkItems = 3;

repeated WorkerCapability capabilities = 10;
WorkItemFilters workItemFilters = 11;
}

enum WorkerCapability {
@@ -732,6 +833,36 @@ enum WorkerCapability {
// When set, the service may return work items without any history events as an optimization.
// It is strongly recommended that all SDKs support this capability.
WORKER_CAPABILITY_HISTORY_STREAMING = 1;

// Indicates that the worker supports scheduled tasks.
// The service may send schedule-triggered orchestration work items,
// and the worker must handle them, including the scheduledTime field.
WORKER_CAPABILITY_SCHEDULED_TASKS = 2;

// Signals that the worker can handle large payloads stored externally (e.g., Blob Storage).
// Work items may contain URI references instead of inline data, and the worker must fetch them.
// This avoids message size limits and reduces network overhead.
WORKER_CAPABILITY_LARGE_PAYLOADS = 3;
}

message WorkItemFilters {
repeated OrchestrationFilter orchestrations = 1;
repeated ActivityFilter activities = 2;
repeated EntityFilter entities = 3;
}

message OrchestrationFilter {
string name = 1;
repeated string versions = 2;
}

message ActivityFilter {
string name = 1;
repeated string versions = 2;
}

message EntityFilter {
string name = 1;
}

message WorkItem {
@@ -750,7 +881,7 @@ message CompleteTaskResponse {
}

message HealthPing {
// No payload
// No payload
}

message StreamInstanceHistoryRequest {
@@ -764,3 +895,8 @@ message StreamInstanceHistoryRequest {
message HistoryChunk {
repeated HistoryEvent events = 1;
}

message InstanceBatch {
// A maximum of 500 instance IDs can be provided in this list.
repeated string instanceIds = 1;
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protobuf file contains numerous changes unrelated to the PR title "Include Exception Properties at FailureDetails", including new messages (ExecutionRewoundEvent, OrchestrationTraceContext, ListInstanceIdsRequest/Response, RestartInstanceRequest/Response, SkipGracefulOrchestrationTerminationsRequest/Response, InstanceBatch, WorkItemFilters, OrchestrationFilter, ActivityFilter, EntityFilter), new fields in multiple existing messages, new RPC methods, and new worker capabilities. This appears to be a protobuf sync from an upstream repository rather than changes specific to this PR. While the PROTO_SOURCE_COMMIT_HASH file update confirms this is intentional, the PR title and description should clarify that this is a protobuf sync that happens to include the FailureDetails.properties field, or these should be separate PRs to keep changes focused and reviewable.

Copilot uses AI. Check for mistakes.
Comment on lines 150 to 161
public boolean isCausedBy(Class<? extends Exception> exceptionClass) {
String actualClassName = this.getErrorType();
try {
// Try using reflection to load the failure's class type and see if it's a subtype of the specified
// exception. For example, this should always succeed if exceptionClass is System.Exception.
Class<?> actualExceptionClass = Class.forName(actualClassName);
return exceptionClass.isAssignableFrom(actualExceptionClass);
} catch (ClassNotFoundException ex) {
// Can't load the class and thus can't tell if it's related
return false;
}
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new innerFailure field is not tested with the isCausedBy method. It's unclear whether isCausedBy should only check the current failure or should traverse the innerFailure chain (similar to how one might need to check both an exception and its causes). Consider adding tests to clarify and document this behavior, or update the isCausedBy method to traverse the chain if that's the intended behavior.

Copilot uses AI. Check for mistakes.
* @return an unmodifiable map of property names to values, or {@code null}
*/
@Nullable
public Map<String, Object> getProperties() {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant