Michael J Swart - Database Whisperer

January 6, 2016

Some Changes for 2016

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 8:00 am

So 2016 is going to be fantastic.

Regular Blog Posts

You’re going to see more content coming out of this site. Most of my posts are technical and they’re based on SQL lessons learned in a very busy OLTP SQL Server environment. I do my best to make each one accessible for everyone without shying away from tricky topics.

If the posts come too frequently, you’re going to be tempted to “mark all as read”. But I think most readers will easily be able to keep up with one post a week.

In 2016, you can count on a blog post every Wednesday. But how do you want to get them?

Via Twitter
If you found my site via twitter, consider subscribing if you want to keep up with this site.

Via RSS
If you’ve already subscribed to the RSS feed, you’re going to continue to get them as you always have, but the world seems to be moving away from RSS.

Via email (new!)
And if you want to get these posts in your inbox, I’ve set up a mailing list. There’s a link at the top of my web site for that. (The mailing list is new, I set it up with tips from Kendra Little).

Continued Illustrations

Those familiar with this site know that I like to draw. It’s fun to combine that hobby with the blog. And I’m going to continue to include illustrations when I can.

Now Using SVG
One change is that I’m going to start including the photos as svg files instead of png. Basically I’m switching from raster to vector illustrations. The file sizes are slightly larger, but they’re still measured in KB. If you do have trouble looking at an illustration, let me know (include device and browser version).

Have fun zooming! If you do, you get to see lots of detail (while I get to notice the flaws).

Talking About Parameter Sniffing

I wrote a talk on parameter sniffing called “Something Stinks: Avoiding Parameter Sniffing Issues & Writing Consistently Fast SQL”.

I gave the talk to work colleagues and I’m really happy with how it went. One No-SQL colleague even told me afterward “I miss relational data.”

You get to see it if you come to Toronto next Tuesday (January 12, 2016) where I’ll be giving the talk for the user group there. Register here.

Or you get to see it if you come to Cleveland for their SQL Saturday (February 6, 2016). Register for that here.

Cheers! And Happy New Year!

-- Comments (3)

October 6, 2015

Don’t Abandon Your Transactions

Filed under: Miscelleaneous SQL,SQL Scripts,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:59 am

About eight years ago, Dan Guzman wrote a post called Use Caution with Explicit Transactions in Stored Procedures. In it, he talks about error handling and transactions, specifically with respect to the XACT_ABORT setting.

XACT_ABORT

Microsoft’s docs for XACT_ABORT are pretty clear. The setting determines whether “SQL Server automatically rolls back the current transaction when a statement raises an error”.

And in nearly every scenario I can think of that uses a transaction, this automatic rollback is the desired behavior. The problem is that it’s not the default behavior. And this leads to Dan Guzman’s advice where he strongly recommends that SET XACT_ABORT ON be included “in all stored procedures with explicit transactions unless you have a specific reason to do otherwise.”

What Could Go Wrong?

When a statement inside a transaction fails (for whatever reason) and XACT_ABORT is set to off, then…

That transaction is abandoned.
Any locks taken during that transaction are still held.
Even if you close the connection from the application, .NET’s connection pooling will keep that connection alive and the transaction on SQL Server stays open.
Fortunately, if someone reuses the same database connection from the connection pool, the old transaction will be rolled back.
Unfortunately developers can’t count on that happening immediately.
Abandoned transactions can cause excessive blocking leading to a concurrency traffic jam.
Also, abandoned transactions can interfere with downstream solutions. Specifically ones that depend on the transaction log. Transaction logs can grow indefinitely. Replication solutions can suffer. If RCSI is enabled, the version store can get out of hand.

Some (or all) of those things happened to us last week.

Steps To Take

Here are some things you can do:

Do you have abandoned transactions right now?
It’s not too hard to identify these abandoned transactions:

-- do you have abandoned transactions?
select p.spid, s.text as last_sql
from sys.sysprocesses p
cross apply sys.dm_exec_sql_text(p.sql_handle) s
where p.status = 'sleeping'
and p.open_tran &gt; 0

Also if you use sp_whoisactive, you can identify these processes as those with a sleeping status and at least one open transaction. But there’s a trick I use to identify these quickly. The sql_text value in the output of sp_whoisactive will typically begin with CREATE PROCEDURE. When I see that, I know it’s time to check whether this connection is sleeping or not.

SET XACT_ABORT ON
Follow Dan Guzman’s advice to include SET XACT_ABORT ON in all stored procedures with explicit transactions.
You can actually find the procedures in your database that need a closer look

-- find procedures that could suffer from abandoned transactions
SELECT * 
FROM sys.procedures 
where OBJECT_DEFINITION(object_id) like '%BEGIN TRAN%'
and OBJECT_DEFINITION(object_id) not like '%XACT_ABORT%'
order by name

Set XACT_ABORT ON server-wide
If you choose, you can decide to set the default value for all connections to your server. You can do that using Management Studio:

Or via a script:

-- turn the server's xact_abort default on
declare @user_options_value bigint;
select @user_options_value = cast(value as bigint)
from sys.configurations 
where name = 'user options';
set @user_options_value = @user_options_value | 0x4000; 
exec sp_configure N'user options', @user_options_value;
RECONFIGURE WITH OVERRIDE;
 
-- (if necessary) turn the server's xact_abort default off
declare @user_options_value bigint;
select @user_options_value = cast(value as bigint)
from sys.configurations 
where name = 'user options';
set @user_options_value = @user_options_value &amp; 0x3fff; 
exec sp_configure N'user options', @user_options_value;
RECONFIGURE WITH OVERRIDE;

Code Review

I love code reviews. They’re more than just a tool for improving quality. They’re learning opportunities and teaching opportunities for all involved.

Last week, I invited readers to have a look at a procedure in a post called Code Review This Procedure. I was looking for anyone to suggest turning on XACT_ABORT as a best practice. It’s a best practice where I work, but things like this slip through. We should have caught this not just during testing, but during development. It’s obvious with hindsight. But I wanted to determine how obvious it was without that hindsight. I guess it was pretty subtle, the XACT_ABORT was not mentioned once. That’s either because the setting is not often used by most developers, or because it is easily overlooked.

But here are some other thoughts that readers had:

Concurrency
Many people pointed at concurrency and transaction isolation levels as a problem. It turns out that concurrency is very hard to do right and nearly impossible to verify by inspection. In fact one of my favorite blog posts is about getting concurrency right. It’s called Mythbusting: Concurrent Update/Insert Solutions. The lesson here is just try it.

Cody Konior (blog) submitted my favorite comment. Cody writes “I often can’t disentangle what the actual impact of various isolation levels would be so I go a different route; which is to create a quick and dirty load test”. I can’t determine concurrency solely by inspection either, which is why I never try. Cody determined that after hammering this procedure, it never failed.

He’s entirely right. Concurrency is done correctly here. Ironically, most of the fixes suggested in other people’s code reviews actually introduced concurrency issues like deadlocks or primary key violations.

People also suggested that blocking would become excessive. It turns out that throughput does not suffer either. My testing framework still managed to process 25,000 batches per second on my desktop without error.

Validating inputs
Some people pointed out that if NULL values or other incorrect values were passed in, then a foreign key violation could be thrown. And they suggested that the procedure should validate the inputs. But what then? If there’s a problem, then there are two choices. Choice one, raise no error and exit quietly which is not ideal. Or choice 2, raise a new error which is not a significant improvement over the existing implementation.

Avoiding the transaction altogether
It is possible to rewrite this procedure without using an explicit transaction. Without the explicit transaction, there’s no chance of abandoning it. And no chance of encountering the trouble that goes with abandoned transactions. But it’s still necessary to worry about concurrency. Solutions that use single statements like MERGE or INSERT...WHERE NOT EXISTS still need SERIALIZABLE and UPDLOCK.

Error handling
I think Aaron Mathison (blog) nailed it: I’m just going to quote his review entirely:

Since your EVENT_TICKETS table has required foreign keys (evidenced by NOT NULL on all columns with foreign key references) the proc should be validating that the input parameter values exist in the foreign key tables before trying to insert into EVENT_TICKETS. If it doesn’t find any one of them it should throw an error and gracefully rollback the transaction and return from the proc.

The way it’s designed currently I think you could get an error on inserting to EVENT_TICKETS that would fail the proc and leave the transaction open.

-- Comments (19)

October 1, 2015

Code Review This Procedure

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 10:28 am

Earlier this week we encountered a web-site outage caused by a database procedure. I’m going to blog about that problem (in detail) in a post scheduled for next week. But before I publish that post, I want to know how subtle or obvious the problem was. It seems obvious to me now, but I have the benefit of hindsight. I wonder whether we could we have avoided this during the code review stage of development.

So before I publish the details, I invite you to do a code review of this procedure in the comment section.

The Procedure

Here’s the procedure. It suffers from the same thing that burned us this week. Do you see any issues with it? Tell me in the comment section.

CREATE PROCEDURE dbo.s_EVENT_TICKETS_GetOrCreate (
  @EventId BIGINT,
  @VenueSeatId BIGINT,
  @PurchaserId BIGINT,
  @PurchaseMethodId BIGINT
)
AS
  SET NOCOUNT ON;
 
  DECLARE @pid BIGINT;
  DECLARE @pmid BIGINT;
  DECLARE @dt DATETIME2
 
  SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
  BEGIN TRAN
 
    -- If the row exists, grab details about the purchaser
    SELECT
      @pid = PurchaserId,
      @pmid = PurchaseMethodId,
      @dt = PurchaseDate
    FROM dbo.EVENT_TICKETS WITH (UPDLOCK)
    WHERE EventId = @EventId
      AND VenueSeatId = @VenueSeatId;
 
    IF ( @pid IS NULL )
    BEGIN
 
      -- The row doesn't exist, insert the row
      SET @dt = SYSDATETIME()
 
      INSERT INTO dbo.EVENT_TICKETS 
        ( EventId, VenueSeatId, PurchaserId, PurchaseMethodId, PurchaseDate )
      VALUES 
        ( @EventId, @VenueSeatId, @PurchaserId, @PurchaseMethodId, @dt );
 
      SELECT @pid = @PurchaserId,
             @pmid = @PurchaseMethodId;
    END
 
  COMMIT TRAN
 
  -- return details about the purchaser
  SELECT 
    @pid as PurchaserId,
    @pmid as PurchaseMethodId,
    @dt as PurchaseDate;

The Schema

Here’s a subset of the table definitions that this procedure is meant to use.

CREATE TABLE dbo.[EVENTS]
(
  EventId BIGINT NOT NULL IDENTITY,
  CONSTRAINT PK_EVENTS 
    PRIMARY KEY (EventId)
  -- etc...
);
 
CREATE TABLE dbo.VENUE_SEATS
(
  VenueSeatId BIGINT NOT NULL IDENTITY,
  CONSTRAINT PK_VENUE_SEATS 
    PRIMARY KEY (VenueSeatId)
  -- etc...
);
 
CREATE TABLE dbo.PURCHASERS
(
  PurchaserId BIGINT NOT NULL IDENTITY,
  CONSTRAINT PK_PURCHASERS 
    PRIMARY KEY (PurchaserId)
  -- etc...
);
 
CREATE TABLE dbo.PURCHASE_METHODS
(
  PurchaseMethodId BIGINT NOT NULL IDENTITY,
  CONSTRAINT PK_PURCHASE_METHODS 
    PRIMARY KEY (PurchaseMethodId)
  -- etc...
);
 
CREATE TABLE dbo.EVENT_TICKETS
(
  EventId BIGINT NOT NULL,
  VenueSeatId BIGINT NOT NULL,
  PurchaserId BIGINT NOT NULL,
  PurchaseMethodId BIGINT NOT NULL,
  PurchaseDate DATETIME2 NOT NULL,
  CONSTRAINT PK_EventId 
    PRIMARY KEY CLUSTERED (EventId, VenueSeatId),
  CONSTRAINT FK_EVENT_TICKETS_EVENTS
    FOREIGN KEY (EventId) REFERENCES dbo.[EVENTS] (EventId),
  CONSTRAINT FK_EVENT_TICKETS_VENUE_SEATS 
    FOREIGN KEY (VenueSeatId) REFERENCES dbo.VENUE_SEATS (VenueSeatId),
  CONSTRAINT FK_EVENT_TICKETS_PURCHASERS 
    FOREIGN KEY (PurchaserId) REFERENCES dbo.PURCHASERS (PurchaserId),
  CONSTRAINT FK_EVENT_TICKETS_PURCHASE_METHODS 
    FOREIGN KEY (PurchaseMethodId) REFERENCES dbo.PURCHASE_METHODS (PurchaseMethodId),
);
GO

-- Comments (30)

September 15, 2015

Troubleshooting Tempdb, a Case Study

Filed under: Miscelleaneous SQL,SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:15 am

Tackling A Hairy Problem
This series includes a number of stand-alone posts which can fit together to tell a bigger story

Part One: Types of Performance Metrics
Part Two: What’s Going On Inside Tempdb?
Part Three: Avoid Frequent use of TVPs With Wide Rows
Part Four: Troubleshooting Tempdb, a Case Study

At work, we store error messages from our web servers in a database. Up until recently, we were loading them using procedures that use wide TVPs. We discovered that if we saw a large enough spike in errors, then we’d also see tempdb contention which would flood our error logging service with more errors. This positive feedback loop made our database server unresponsive and the system as a whole couldn’t recover without intervention. That’s the problem I tackled, and I want to explain my troubleshooting process.

The Troubleshooting Landscape

Let’s start at the beginning. Here’s an abstract visualization of how I see throughput and performance thresholds.

The blue area represents the load sent to the database. The red bars are examples of performance thresholds that can limit throughput. Based on hardware and configuration, those red lines can be moved up or down. The lowest red line is the performance ceiling. In this example, notice that adding more CPUs would not allow more throughput unless tempdb contention is also tackled:

The Troubleshooting Landscape

I like to show this graph to people in order to illustrate the implicit relationship between a database developer and a database administrator. There’s obviously a lot of overlap, but in general:

The red lines are part of the DBA’s job. It is up to the DBA to provide and configure a database server that can support the load sent by the application.
The blue area is part of the developer’s job. It is up to the developer to make most efficient use of the hardware given.

Happy databases are ones where the blue and the red don’t meet.

SQL Server DBAs are the only ones that have to worry about tempdb (Oracle and Postgres DBAs get a break).
But look at that tempdb contention limit. I like to point out to anyone who will listen that tempdb contention is Microsoft’s fault. Every minute spent on tempdb problems is time spent working around a defect in SQL Server. It’s frustrating. It’s already hard enough worrying about CPU and IO without worrying about logical contention caused by the database engine. I feel like this guy:

My Troubleshooting Workflow

So if you’ve been following my blog for the past few weeks, this is what I’ve been leading up to. With a ton of hindsight, here’s my workflow for troubleshooting tempdb contention:

Some of the early information in the early steps can be detected using sp_whoisactive, and some of the last steps are links to other parts of this blog series.

But…

The world is rarely as nice and predictable as we model it to be. Database load is no exception. Database load is not a smooth thing. It’s spikey and uneven and it consists of an unpredictable variety of queries.

Once when I thought that tempdb transactions per second was the best metric to watch, I captured this graph over a couple of days: