Saturday, March 8, 2014

Working with SqlAzureExecutionStrategy

One of my favorite features of Entity Framework 6 was the SqlAzureExecutionStrategy. At least I thought this. I wanted to use it also in my company’s network since we get periodically timeouts connecting to a SQL Server instance (SQL Server 2008 in this case). So I thought, perfect, the SqlAzureExecutionStrategy is the solution to my problem. However, it didn’t work. So I had to investigate...

My approach was to write a sample program starting two tasks:
  • Task 1 should open a transaction, update a database row and then wait for some time (without committing or rolling back the transaction)
  • Task 2 should try to modify the same database row during Task 1’s wait time
Without any further preparation, this approach raised an exception in Task 2:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
Next I added my own implementation of DbConfiguration, which simply configured the execution strategy:
public class MyDbConfiguration : DbConfiguration
{
  public MyDbConfiguration()
  {
    this.SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy()); 
  }
}
Unfortunately, this approach did not work since the "retrying execution strategies" do not support user-initiated transactions (see Limitations with Retrying Execution Strategies (EF6 onwards)). To my rescue, the mentioned article describes also a workaround which prevents the usage of the SqlAzureExecutionStrategy together with the transaction:
public class MyDbConfiguration : DbConfiguration
{
  public MyDbConfiguration()
  {
    this.SetExecutionStrategy("System.Data.SqlClient", () => SuspendExecutionStrategy
      ? (IDbExecutionStrategy)new DefaultExecutionStrategy()
      : new SqlAzureExecutionStrategy()); 
  }
  public static bool SuspendExecutionStrategy
  {
    get { return (bool?)CallContext.LogicalGetData("SuspendExecutionStrategy") ?? false; }
    set { CallContext.LogicalSetData("SuspendExecutionStrategy", value); }
  }
}
Now the program was running again, but I still got the SqlException with the timeout. Therefore I now added some database logging, another great feature of Entity Framework 6 (see Logging and Intercepting Database Operations). The log was interesting, since it provided some additional insights. But with my special problem, it was not really useful.
10:49:56,117 Task 1: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:49:56,119 Task 1: -- @0: '3/5/2014 10:49:56 AM' (Type = DateTime2)
10:49:56,119 Task 1: -- @1: '3621840d-724e-4a62-b22a-accb215dfb1b' (Type = Guid)
10:49:56,120 Task 1: -- Executing at 3/5/2014 10:49:56 AM +01:00
10:49:56,130 Task 1: -- Completed in 6 ms with result: 1
10:49:56,430 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:49:56,431 Task 2: -- @0: '3/5/2014 10:49:56 AM' (Type = DateTime2)
10:49:56,431 Task 2: -- @1: '3621840d-724e-4a62-b22a-accb215dfb1b' (Type = Guid)
10:49:56,431 Task 2: -- Executing at 3/5/2014 10:49:56 AM +01:00
10:50:01,557 Task 2: -- Failed in 5124 ms with error: Timeout expired.  The timeout period elapsed pr...
10:50:01,732 Task 2: update failed
System.Data.Entity.Infrastructure.DbUpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
The statement has been terminated. ---> System.ComponentModel.Win32Exception: The wait operation timed out
Then I had a look at the source code of SqlAzureExecutionStrategy. Its implementation consists of mainly one method:
protected override bool ShouldRetryOn(Exception exception)
{
  return SqlAzureRetriableExceptionDetector.ShouldRetryOn(exception);
}
Since the method is protected, I decided to implement my own strategy derived from SqlAzureExecutionStrategy. In the first step, I simply implemented my own ShouldRetryOn method, which I used for setting a breakpoint. I found out that the method was called with the SqlException, but SqlAzureExecutionStrategy's implementation returned false.

SqlAzureExecutionStrategy delegates the check of the exception to SqlAzureRetriableExceptionDetector. As you can see in the source code, it returns true for TimeoutException and for SqlException with a specific SqlError. However, "my" SqlException has a SqlError with Number == -2, for which false is returned.

Now I added some real logic to my strategy:
protected override bool ShouldRetryOn(Exception exception)
{
  bool shouldRetry = false;

  SqlException sqlException = exception as SqlException;
  if (sqlException != null)
  {
    foreach (SqlError error in sqlException.Errors)
    {
      if (error.Number == -2)
        shouldRetry = true;
    }
  }

  shouldRetry = shouldRetry || base.ShouldRetryOn(exception);

  Logger.WriteLog("ShouldRetryOn: " + shouldRetry);
  return shouldRetry;
}
With this implementation I had two benefits:
  • The connection attempt was retried also in my deadlock scenario.
  • I got a log entry for every retry.
Now I got a "beautiful" trace of my retry activities. And also an explicit exception, when the problem couldn’t be solved by simply retrying it:
10:56:15,805 Task 1: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:15,807 Task 1: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:15,808 Task 1: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:15,808 Task 1: -- Executing at 3/5/2014 10:56:15 AM +01:00
10:56:15,816 Task 1: -- Completed in 5 ms with result: 1
10:56:15,823 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:15,824 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:15,824 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:15,824 Task 2: -- Executing at 3/5/2014 10:56:15 AM +01:00
10:56:20,949 Task 2: -- Failed in 5123 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:21,046 MyExecutionStrategy: retrying
10:56:21,051 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:21,051 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:21,052 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:21,052 Task 2: -- Executing at 3/5/2014 10:56:21 AM +01:00
10:56:26,137 Task 2: -- Failed in 5083 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:26,219 MyExecutionStrategy: retrying
10:56:27,241 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:27,241 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:27,242 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:27,242 Task 2: -- Executing at 3/5/2014 10:56:27 AM +01:00
10:56:32,323 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:32,404 MyExecutionStrategy: retrying
10:56:35,653 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:35,653 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:35,653 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:35,654 Task 2: -- Executing at 3/5/2014 10:56:35 AM +01:00
10:56:40,737 Task 2: -- Failed in 5083 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:40,822 MyExecutionStrategy: retrying
10:56:47,901 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:47,901 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:47,901 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:47,902 Task 2: -- Executing at 3/5/2014 10:56:47 AM +01:00
10:56:52,982 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:53,066 MyExecutionStrategy: retrying
10:57:08,527 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:57:08,529 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:57:08,530 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:57:08,531 Task 2: -- Executing at 3/5/2014 10:57:08 AM +01:00
10:57:13,615 Task 2: -- Failed in 5082 ms with error: Timeout expired.  The timeout period elapsed pr...
10:57:13,699 MyExecutionStrategy: retrying
10:57:13,745 Task 2: update failed
System.Data.Entity.Infrastructure.RetryLimitExceededException: Maximum number of retries (5) exceeded while executing database operations with 'MyExecutionStrategy'. See inner exception for the most recent failure. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server
is not responding.
The statement has been terminated. ---> System.ComponentModel.Win32Exception: The wait operation timed out
As you can see, the System.Data.Entity.Infrastructure.DbUpdateException was changed now into a System.Data.Entity.Infrastructure.RetryLimitExceededException. The inner System.Data.Entity.Core.UpdateException remains the same.

My final issue was that I did misunderstand the optional parameters of SqlAzureExecutionStrategy: maxRetryCount is simply the maximum number of retries. But with maxDelay it is more complicated. The delay between the retries is connected to retry number and the power of 2. This results in the following delay intervals (ignoring some minor random stuff):
0, 1, 3, 7, 15, 31, 63, ... (seconds)
You can see the delay also in the trace above: the timespan between "MyExecutionStrategy: retrying" and "Task 2: UPDATE ...".

maxDelay does not set the duration of the complete operation (from first try until last retry). This was my expectation. Instead it limits only the delay between two retries. With a maxDelay of 5, we get par example:
0, 1, 3, 5, 5, 5, 5, ...

In the last trace, you can see the decreased maxDelay. And also the final success, since here Task 1 rolls back after 50 seconds:
11:06:39,953 Task 1: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:39,955 Task 1: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:39,955 Task 1: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:39,955 Task 1: -- Executing at 3/5/2014 11:06:39 AM +01:00
11:06:39,964 Task 1: -- Completed in 4 ms with result: 1
11:06:39,971 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:39,971 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:39,972 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:39,972 Task 2: -- Executing at 3/5/2014 11:06:39 AM +01:00
11:06:45,097 Task 2: -- Failed in 5123 ms with error: Timeout expired.  The timeout period elapsed pr...
11:06:45,193 MyExecutionStrategy: retrying
11:06:45,199 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:45,199 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:45,199 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:45,200 Task 2: -- Executing at 3/5/2014 11:06:45 AM +01:00
11:06:50,283 Task 2: -- Failed in 5082 ms with error: Timeout expired.  The timeout period elapsed pr...
11:06:50,368 MyExecutionStrategy: retrying
11:06:51,427 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:51,427 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:51,427 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:51,428 Task 2: -- Executing at 3/5/2014 11:06:51 AM +01:00
11:06:56,510 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
11:06:56,593 MyExecutionStrategy: retrying
11:06:59,686 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:59,687 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:59,687 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:59,687 Task 2: -- Executing at 3/5/2014 11:06:59 AM +01:00
11:07:04,772 Task 2: -- Failed in 5083 ms with error: Timeout expired.  The timeout period elapsed pr...
11:07:04,853 MyExecutionStrategy: retrying
11:07:09,858 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:07:09,858 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:07:09,859 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:07:09,859 Task 2: -- Executing at 3/5/2014 11:07:09 AM +01:00
11:07:14,939 Task 2: -- Failed in 5079 ms with error: Timeout expired.  The timeout period elapsed pr...
11:07:15,023 MyExecutionStrategy: retrying
11:07:20,028 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:07:20,029 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:07:20,030 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:07:20,031 Task 2: -- Executing at 3/5/2014 11:07:20 AM +01:00
11:07:25,113 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
11:07:25,196 MyExecutionStrategy: retrying
11:07:29,968 Task 1: rolling back
11:07:29,975 Task 1: rolled back
11:07:30,201 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:07:30,202 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:07:30,204 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:07:30,204 Task 2: -- Executing at 3/5/2014 11:07:30 AM +01:00
11:07:30,208 Task 2: -- Completed in 2 ms with result: 1
FInally, I was even more enthusiatstic with SqlAzureExecutionStrategy than before. And I hope, you are, too.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.