Saturday, March 8, 2014

Working with SqlAzureExecutionStrategy

One of my favorite features of Entity Framework 6 was the SqlAzureExecutionStrategy. At least I thought this. I wanted to use it also in my company’s network since we get periodically timeouts connecting to a SQL Server instance (SQL Server 2008 in this case). So I thought, perfect, the SqlAzureExecutionStrategy is the solution to my problem. However, it didn’t work. So I had to investigate...

My approach was to write a sample program starting two tasks:
  • Task 1 should open a transaction, update a database row and then wait for some time (without committing or rolling back the transaction)
  • Task 2 should try to modify the same database row during Task 1’s wait time
Without any further preparation, this approach raised an exception in Task 2:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
Next I added my own implementation of DbConfiguration, which simply configured the execution strategy:
public class MyDbConfiguration : DbConfiguration
{
  public MyDbConfiguration()
  {
    this.SetExecutionStrategy("System.Data.SqlClient", () => new SqlAzureExecutionStrategy()); 
  }
}
Unfortunately, this approach did not work since the "retrying execution strategies" do not support user-initiated transactions (see Limitations with Retrying Execution Strategies (EF6 onwards)). To my rescue, the mentioned article describes also a workaround which prevents the usage of the SqlAzureExecutionStrategy together with the transaction:
public class MyDbConfiguration : DbConfiguration
{
  public MyDbConfiguration()
  {
    this.SetExecutionStrategy("System.Data.SqlClient", () => SuspendExecutionStrategy
      ? (IDbExecutionStrategy)new DefaultExecutionStrategy()
      : new SqlAzureExecutionStrategy()); 
  }
  public static bool SuspendExecutionStrategy
  {
    get { return (bool?)CallContext.LogicalGetData("SuspendExecutionStrategy") ?? false; }
    set { CallContext.LogicalSetData("SuspendExecutionStrategy", value); }
  }
}
Now the program was running again, but I still got the SqlException with the timeout. Therefore I now added some database logging, another great feature of Entity Framework 6 (see Logging and Intercepting Database Operations). The log was interesting, since it provided some additional insights. But with my special problem, it was not really useful.
10:49:56,117 Task 1: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:49:56,119 Task 1: -- @0: '3/5/2014 10:49:56 AM' (Type = DateTime2)
10:49:56,119 Task 1: -- @1: '3621840d-724e-4a62-b22a-accb215dfb1b' (Type = Guid)
10:49:56,120 Task 1: -- Executing at 3/5/2014 10:49:56 AM +01:00
10:49:56,130 Task 1: -- Completed in 6 ms with result: 1
10:49:56,430 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:49:56,431 Task 2: -- @0: '3/5/2014 10:49:56 AM' (Type = DateTime2)
10:49:56,431 Task 2: -- @1: '3621840d-724e-4a62-b22a-accb215dfb1b' (Type = Guid)
10:49:56,431 Task 2: -- Executing at 3/5/2014 10:49:56 AM +01:00
10:50:01,557 Task 2: -- Failed in 5124 ms with error: Timeout expired.  The timeout period elapsed pr...
10:50:01,732 Task 2: update failed
System.Data.Entity.Infrastructure.DbUpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
The statement has been terminated. ---> System.ComponentModel.Win32Exception: The wait operation timed out
Then I had a look at the source code of SqlAzureExecutionStrategy. Its implementation consists of mainly one method:
protected override bool ShouldRetryOn(Exception exception)
{
  return SqlAzureRetriableExceptionDetector.ShouldRetryOn(exception);
}
Since the method is protected, I decided to implement my own strategy derived from SqlAzureExecutionStrategy. In the first step, I simply implemented my own ShouldRetryOn method, which I used for setting a breakpoint. I found out that the method was called with the SqlException, but SqlAzureExecutionStrategy's implementation returned false.

SqlAzureExecutionStrategy delegates the check of the exception to SqlAzureRetriableExceptionDetector. As you can see in the source code, it returns true for TimeoutException and for SqlException with a specific SqlError. However, "my" SqlException has a SqlError with Number == -2, for which false is returned.

Now I added some real logic to my strategy:
protected override bool ShouldRetryOn(Exception exception)
{
  bool shouldRetry = false;

  SqlException sqlException = exception as SqlException;
  if (sqlException != null)
  {
    foreach (SqlError error in sqlException.Errors)
    {
      if (error.Number == -2)
        shouldRetry = true;
    }
  }

  shouldRetry = shouldRetry || base.ShouldRetryOn(exception);

  Logger.WriteLog("ShouldRetryOn: " + shouldRetry);
  return shouldRetry;
}
With this implementation I had two benefits:
  • The connection attempt was retried also in my deadlock scenario.
  • I got a log entry for every retry.
Now I got a "beautiful" trace of my retry activities. And also an explicit exception, when the problem couldn’t be solved by simply retrying it:
10:56:15,805 Task 1: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:15,807 Task 1: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:15,808 Task 1: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:15,808 Task 1: -- Executing at 3/5/2014 10:56:15 AM +01:00
10:56:15,816 Task 1: -- Completed in 5 ms with result: 1
10:56:15,823 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:15,824 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:15,824 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:15,824 Task 2: -- Executing at 3/5/2014 10:56:15 AM +01:00
10:56:20,949 Task 2: -- Failed in 5123 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:21,046 MyExecutionStrategy: retrying
10:56:21,051 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:21,051 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:21,052 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:21,052 Task 2: -- Executing at 3/5/2014 10:56:21 AM +01:00
10:56:26,137 Task 2: -- Failed in 5083 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:26,219 MyExecutionStrategy: retrying
10:56:27,241 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:27,241 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:27,242 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:27,242 Task 2: -- Executing at 3/5/2014 10:56:27 AM +01:00
10:56:32,323 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:32,404 MyExecutionStrategy: retrying
10:56:35,653 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:35,653 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:35,653 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:35,654 Task 2: -- Executing at 3/5/2014 10:56:35 AM +01:00
10:56:40,737 Task 2: -- Failed in 5083 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:40,822 MyExecutionStrategy: retrying
10:56:47,901 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:56:47,901 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:56:47,901 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:56:47,902 Task 2: -- Executing at 3/5/2014 10:56:47 AM +01:00
10:56:52,982 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
10:56:53,066 MyExecutionStrategy: retrying
10:57:08,527 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
10:57:08,529 Task 2: -- @0: '3/5/2014 10:56:15 AM' (Type = DateTime2)
10:57:08,530 Task 2: -- @1: '4e3554be-1e13-461b-af12-848575317beb' (Type = Guid)
10:57:08,531 Task 2: -- Executing at 3/5/2014 10:57:08 AM +01:00
10:57:13,615 Task 2: -- Failed in 5082 ms with error: Timeout expired.  The timeout period elapsed pr...
10:57:13,699 MyExecutionStrategy: retrying
10:57:13,745 Task 2: update failed
System.Data.Entity.Infrastructure.RetryLimitExceededException: Maximum number of retries (5) exceeded while executing database operations with 'MyExecutionStrategy'. See inner exception for the most recent failure. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server
is not responding.
The statement has been terminated. ---> System.ComponentModel.Win32Exception: The wait operation timed out
As you can see, the System.Data.Entity.Infrastructure.DbUpdateException was changed now into a System.Data.Entity.Infrastructure.RetryLimitExceededException. The inner System.Data.Entity.Core.UpdateException remains the same.

My final issue was that I did misunderstand the optional parameters of SqlAzureExecutionStrategy: maxRetryCount is simply the maximum number of retries. But with maxDelay it is more complicated. The delay between the retries is connected to retry number and the power of 2. This results in the following delay intervals (ignoring some minor random stuff):
0, 1, 3, 7, 15, 31, 63, ... (seconds)
You can see the delay also in the trace above: the timespan between "MyExecutionStrategy: retrying" and "Task 2: UPDATE ...".

maxDelay does not set the duration of the complete operation (from first try until last retry). This was my expectation. Instead it limits only the delay between two retries. With a maxDelay of 5, we get par example:
0, 1, 3, 5, 5, 5, 5, ...

In the last trace, you can see the decreased maxDelay. And also the final success, since here Task 1 rolls back after 50 seconds:
11:06:39,953 Task 1: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:39,955 Task 1: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:39,955 Task 1: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:39,955 Task 1: -- Executing at 3/5/2014 11:06:39 AM +01:00
11:06:39,964 Task 1: -- Completed in 4 ms with result: 1
11:06:39,971 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:39,971 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:39,972 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:39,972 Task 2: -- Executing at 3/5/2014 11:06:39 AM +01:00
11:06:45,097 Task 2: -- Failed in 5123 ms with error: Timeout expired.  The timeout period elapsed pr...
11:06:45,193 MyExecutionStrategy: retrying
11:06:45,199 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:45,199 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:45,199 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:45,200 Task 2: -- Executing at 3/5/2014 11:06:45 AM +01:00
11:06:50,283 Task 2: -- Failed in 5082 ms with error: Timeout expired.  The timeout period elapsed pr...
11:06:50,368 MyExecutionStrategy: retrying
11:06:51,427 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:51,427 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:51,427 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:51,428 Task 2: -- Executing at 3/5/2014 11:06:51 AM +01:00
11:06:56,510 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
11:06:56,593 MyExecutionStrategy: retrying
11:06:59,686 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:06:59,687 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:06:59,687 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:06:59,687 Task 2: -- Executing at 3/5/2014 11:06:59 AM +01:00
11:07:04,772 Task 2: -- Failed in 5083 ms with error: Timeout expired.  The timeout period elapsed pr...
11:07:04,853 MyExecutionStrategy: retrying
11:07:09,858 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:07:09,858 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:07:09,859 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:07:09,859 Task 2: -- Executing at 3/5/2014 11:07:09 AM +01:00
11:07:14,939 Task 2: -- Failed in 5079 ms with error: Timeout expired.  The timeout period elapsed pr...
11:07:15,023 MyExecutionStrategy: retrying
11:07:20,028 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:07:20,029 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:07:20,030 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:07:20,031 Task 2: -- Executing at 3/5/2014 11:07:20 AM +01:00
11:07:25,113 Task 2: -- Failed in 5080 ms with error: Timeout expired.  The timeout period elapsed pr...
11:07:25,196 MyExecutionStrategy: retrying
11:07:29,968 Task 1: rolling back
11:07:29,975 Task 1: rolled back
11:07:30,201 Task 2: UPDATE [dbo].[T_Message] SET [LastUpdate] = @0 WHERE ([Messageid] = @1)
11:07:30,202 Task 2: -- @0: '3/5/2014 11:06:39 AM' (Type = DateTime2)
11:07:30,204 Task 2: -- @1: 'c5da0be6-a4a8-4018-9c8f-c1062aa9a958' (Type = Guid)
11:07:30,204 Task 2: -- Executing at 3/5/2014 11:07:30 AM +01:00
11:07:30,208 Task 2: -- Completed in 2 ms with result: 1
FInally, I was even more enthusiatstic with SqlAzureExecutionStrategy than before. And I hope, you are, too.

12 comments:

  1. Thank you for the great post. I am wondering where, in your MyDbConfiguration class, did you put the ShouldRetryOn override method so that it woluld actually overwrite SqlAzureExecutionStrategy's implementation? Thanks again.

    ReplyDelete
    Replies
    1. I implemented my own strategy, based on SqlAzureExecutionStrategy. There I was able to override ShouldRetryOn(). In MyDbConfiguration I referenced this new strategy.
      Is this clear enough?

      Delete
    2. And where/when did you initialize Logger instance?

      Delete
  2. I cannot get this to work. I still keep getting "The configured execution strategy 'SqlAzureExecutionStrategy' does not support user initiated transactions". I have overridden the ShouldRetryOn - Do I need to register this somewhere?

    ReplyDelete
  3. Yo man, so far so good. Nice examples.
    You could improve a little bit your foreach however, as you don't necessarily need to parse all the errors:

    foreach (SqlError error in sqlException.Errors)
    {
    if (error.Number == -2)
    {
    shouldRetry = true;
    break; // <--- here does the magic. No need to continue once you detected your -2
    }
    }

    As for both posts above me:
    you guys need to implement your own SqlAzureExecutionStrategy to override the ShouldRetryOn.

    So pretty much you just need to create a class that inherits from SqlAzureExecutionStrategy, and then drop the ShouldRetryOn in there.

    And to use that new class, just replace the "new SqlAzureExecutionStrategy()" from the
    MyDbConfiguration with your class.

    Like so:

    public MyDbConfiguration()
    {
    this.SetExecutionStrategy("System.Data.SqlClient", () => SuspendExecutionStrategy
    ? (IDbExecutionStrategy)new DefaultExecutionStrategy()
    : new MyUberCustomThatIJustCreatedAboveSqlAzureExecutionStrategy());
    }

    ReplyDelete
  4. I have read this numerous times now and implemented exactly as above. I have my dBconfiguration class setup and it's hitting it on app start

    public class EfConfig : DbConfiguration
    {
    public EfConfig()
    {
    SetExecutionStrategy("System.Data.SqlClient", () => SuspendExecutionStrategy
    ? (IDbExecutionStrategy)new DefaultExecutionStrategy()
    : new CustomSqlAzureExecutionStrategy());
    }

    public static bool SuspendExecutionStrategy
    {
    get { return (bool?)CallContext.LogicalGetData("SuspendExecutionStrategy") ?? false; }
    set { CallContext.LogicalSetData("SuspendExecutionStrategy", value); }
    }
    }

    Then I have a custom class as referenced above called 'CustomSqlAzureExecutionStrategy' which I have put below and have overridden the ShouldRetryOn

    public class CustomSqlAzureExecutionStrategy : SqlAzureExecutionStrategy
    {
    protected override bool ShouldRetryOn(Exception exception)
    {
    var shouldRetry = false;

    var sqlException = exception as SqlException;
    if (sqlException != null)
    {
    foreach (SqlError error in sqlException.Errors)
    {
    if (error.Number == -2)
    {
    shouldRetry = true;
    }

    }
    }
    shouldRetry = shouldRetry || base.ShouldRetryOn(exception);
    return shouldRetry;
    }
    }

    However, when I run my app I still get the same error I started with, but this time just pointing to the custom class?

    The configured execution strategy 'CustomSqlAzureExecutionStrategy' does not support user initiated transactions.

    Have I missed something obvious here?

    ReplyDelete
  5. This is weird, the only thing I can think of is that somehow the underlying database linked to your EF is not supporting that kinda 'transient' transactions or something.

    I think we used the above code on both SQLServer 2012 and 2014, with databases both in 11 and 12 version. I can vouch that this works 100% on 2014 and v12 on an Azure server since we have our ShouldRetryOn working in our production environment (I put a log in the method to make sure that does the job).

    I'm afraid I can't help you more as I am no EF expert, nor a database experienced enough guy :/

    ReplyDelete
  6. Oh wait, I just remembered something man: "The configured execution strategy 'CustomSqlAzureExecutionStrategy' does not support **user initiated transactions**."

    I didn't have this error, however I have read that yes, the ShouldRetryOn will not work on 'user initiated transactions'; which I don't know what they are.

    Howerer, this is why you have the SuspendExecutionStrategy property: to prevent using this execution strategy where it is not supported.

    Have a look here:
    http://romiller.com/2013/08/19/ef6-suspendable-execution-strategy/

    Pretty much you need to know which query in your code is popping that error. Once you know which one it is, surround it with the DbContext just like in the article example.

    using (var db = new YourContext())
    {
    EfConfig.SuspendExecutionStrategy = true;

    // Perform the user initiated query withou the execution strat/retry
    db.YourStuff.Add(new Stuff { thing = crap });
    db.SaveChanges();

    EfConfig.SuspendExecutionStrategy = false;
    }

    Hope this helps bro

    ReplyDelete
  7. Have a look here. I have implemented it more or less as above, but still get the A transport-level error has occurred when receiving results from the server.

    http://stackoverflow.com/questions/30421695/the-configured-execution-strategy-sqlazureexecutionstrategy-does-not-support-u

    And thanks for taking the time to reply.

    ReplyDelete
  8. It is not safe to retry on timeout exception, now also on the source code is left a warning about that:

    // This exception can be thrown even if the operation completed succesfully, so it's safer to let the application fail.
    // DBNETLIB Error Code: -2
    // Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated.
    //case -2:


    https://github.com/aspnet/EntityFramework6/blob/master/src/EntityFramework.SqlServer/SqlAzureRetriableExceptionDetector.cs

    ReplyDelete
  9. It was really a nice post and I was really impressed by reading this
    Dot Net Online Training Hyderabad

    ReplyDelete