Error handling was recently introduced. Where supported, it will cause a workflow instance to Pause instead of fails. This then gives the opportunity for the instance to be resumed at the failed step for it to try again. This is a great feature!
However, my biggest problem with this feature is that it was implemented to only support certain connectors and certain error codes. If an unsupported code is received by an action, it will cause the entire instance to fail instead of being paused. This situation is a HUGE inconvenience for complex workflows. By only having a specific selection of error codes that error handling supports, Nintex severely handicaps the situations where error handling can benefit it's customers. In my mind, the current approach is fundamentally flawed because it will result in either:
Nintex to updates codes that should be supported - which involves staying up-to-date with all 3rd party connectors and their possible errors.
Nintex to doesn't update codes - resulting in error handling that becomes more and more obsolete as 3rd connectors update their APIs with addition error codes.
My suggestion: Please consider making error handling support ALL failures, regardless of the error code. I'm not sure what the reasoning was for making the existing implementation a targeted approach, as it puts more onus on Nintex if they wish to have a usable feature, but also removes the agency from customer in properly handling these situations.
Thank you for considering.
Examples of messages from the Smartsheet connector that Error Handling hasn't caught, but have worked when retried later (the point of error handling):
Request failed because sheetId 123456789 is currently being updated by another request that uses the same access token. Please retry your request once the previous request has completed.
An unexpected error has occurred. Please contact the Support team at https://help.smartsheet.com/contact for assistance.
AddRow successful but returned empty row data.
Unexpected character encountered while parsing value: <. Path '', line 0, position 0.
Unexpected character encountered while parsing value: U. Path '', line 0, position 0.
Received an error response from the connector: Unexpected Error (ExtAuthN)
Thank you for providing a thorough explanation. I agree the current "error handling" isn't acceptable and we're finding that it is actually making the experience worse for users because they have to manually terminate and restart anyway.
I think this type of "error handling" continues to ignore the nature of web services which every action is built upon. To write off every interaction that results in a code equal to or greater than 400 and pause or terminate is bad practice. It is treating every error as a critical failure when often times the message is as simple as "A file already exists with the file name X." This is an easily handled type of error and as such should be handled. Pausing is not handling the error. It is a service disruption for something as simple as "try to upload file, if error contains "a file already..." then use the index of a loop to append ' (1)' to the file name, repeat." Pausing doesn't provide any benefit and it isn't "error handling." It simply is not fair to say all status codes over 400 require pausing or termination in an automation platform. A large part of processes (automated included) is handling edge cases. Otherwise what is the point of automation that relies on the stars to align?