Latest Posts

Pluralsight Free April

It’s a few days into April, but not too late I hope to mention that Pluralsight is offering their entire library, free to new accounts, for the month of April. Sign up on their promotion page to take advantage of this offer.

And, I do have a few courses up there for anyone interested:

And there’s an upcoming course on index maintenance as well that I hope will be published shortly.

Books of 2019

I got sloppy with my book tracking this year, and completely stopped marking what I read on Goodreads part way through the year, hence no full list of what I read.

The high points of the books read however have to be the Expanse series (Babylon’s Ashes and Persepolis Rising were read this year). This series just keeps getting better, larger stakes, more problems. Looking forward to the final book.

Another series that I read that I’ve been thoroughly enjoying is the Starship Mage series. Last year saw the publication of “Sword of Mars“, where we find some answers, get a few massive battles and a really tense cliffhanger. Hope the next book comes out soon.

And the last book that I want to call attention to is “All those Explosions were Someone Else’s Fault“, which is a hilarious and interesting take on the superhero/supervillian genre. I highly recommend it.

I will try and do a better job this year of tracking my reading. My goal is again 75 books in a year, and I’m somewhat confident that I can make that number.

A new way of getting the actual execution plan

Getting the actual execution plan, that is the plan with run-time statistics for a query from an application has always been a little difficult. It’s fine if you can get the query running in Management Studio and reproducing the behaviour from the app, but that can be difficult.

There’s the query_post_execution_showplan event in Extended Events, but that’s a pretty heavy event and not something that I’d like to run on a busy server.

No more! SQL 2019 adds a new plan-related function to get the last actual plan for a query: sys.dm_exec_query_plan_stats.

The function is not available by default, Last_Query_Plan_Stats database scoped configuration has to be set  to allow it to run, and it’s going to add some overhead, how much is still to be determined.

ALTER DATABASE SCOPED CONFIGURATION SET LAST_QUERY_PLAN_STATS = ON

It’s a function which takes a parameter of a plan handle or a sql handle. Hence it can be used alone, or it can be on the right-hand side of an apply from any table or DMV that has a plan handle or sql handle in it. As an example it can be used with QueryStore.

WITH hist
AS (SELECT q.query_id, 
           q.query_hash,
           MAX(rs.max_duration)  AS MaxDuration
    FROM 
        sys.query_store_query q INNER JOIN sys.query_store_plan p ON q.query_id = p.query_id
        INNER JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
        INNER JOIN sys.query_store_runtime_stats_interval rsi ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
    WHERE start_time < DATEADD(HOUR, -1, GETDATE())
    GROUP BY q.query_id, query_hash),
recent
AS (SELECT q.query_id, 
           q.query_hash,
           MAX(rs.max_duration)  AS MaxDuration
    FROM 
        sys.query_store_query q INNER JOIN sys.query_store_plan p ON q.query_id = p.query_id
        INNER JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
        INNER JOIN sys.query_store_runtime_stats_interval rsi ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
    WHERE start_time > DATEADD(HOUR, -1, GETDATE())
    GROUP BY q.query_id, query_hash),
regressed_queries 
AS (
    SELECT hist.query_id, 
            hist.query_hash
        FROM hist INNER JOIN recent ON hist.query_id = recent.query_id
        WHERE recent.MaxDuration > 1.2*hist.MaxDuration
    )
SELECT st.text, OBJECT_NAME(st.objectid) AS ObjectName, qs.last_execution_time, qps.query_plan
    FROM sys.dm_exec_query_stats qs 
        CROSS APPLY sys.dm_exec_sql_text (qs.sql_handle) st
        OUTER APPLY sys.dm_exec_query_plan_stats(qs.plan_handle) qps
    WHERE query_hash IN (SELECT query_hash FROM regressed_queries)

The above query checks query store for any query that has regressed in duration in the last hour (defined as max duration > 120% of previous max duration) and pulls the last actual plan for that query out.

And a look at that plan tells me that I have a bad parameter sniffing problem, a problem that might have been missed or mis-diagnosed with only the estimated plan available.

In-line scalar functions in SQL Server 2019

Yes, yes, yes, finally!

It’s hardly a secret that I’m not a fan of scalar user-defined functions. I refer to them as ‘developer pit-traps’ due to the amount of times I’ve seen developers absolutely wreck their database performance by over-using them (or using them at all).

The main problem with them is that they haven’t been in-line, meaning the function gets evaluated on every single row, and the overhead from doing so is usually terrible.

One of the improvements in SQL Server 2019 is that scalar user-defined functions now are in-line. Not all of them, there are conditions that have to be met. Most scalar UDFs that I’ve seem in client systems will meet them, the not referencing table variables will probably be the main limiting factor.

The full requirements are laid out in the documentation: https://docs.microsoft.com/en-us/sql/relational-databases/user-defined-functions/scalar-udf-inlining

I’m going to use the same function that I used when I evaluated natively-compiled functions (https://sqlinthewild.co.za/index.php/2016/01/12/natively-compiled-user-defined-functions/), and run it against a table with 860k rows in it, both in compat mode 140 (SQL Server 2017) and compat mode 150 (SQL Server 2019)

CREATE FUNCTION dbo.DateOnly (@Input DATETIME)
  RETURNS DATETIME
AS
BEGIN
  RETURN DATEADD(dd, DATEDIFF (dd, 0, @Input), 0);
END
GO

As in the earlier post, I’ll use extended events to catch the performance characteristics.

First, something to compare against. The query, without functions, is:

SELECT DATEADD(dd, DATEDIFF (dd, 0, TransactionDate), 0) FROM Transactions

This takes, on average,  343ms to run, and 320ms of CPU time.

The results of the first test are impressive.

Compat ModeDuration (ms)CPU (ms)
14010 6668594
150356353

I keep having people ask about SCHEMABINDING, so same test again, with the function recreated WITH SCHEMABINDING

Compat ModeDuration (ms)CPU (ms)
14054483818
150325320

Better, but still over an order of magnitude slower than the query without the function in SQL 2017 and earlier.

Last test, what about something with data access? I’ll switch to my Shipments and ShipmentDetails tables for this. The base query without the function is:

SELECT s.ShipmentID, 
    (SELECT SUM(Mass) AS TotalMass FROM ShipmentDetails sd WHERE sd.ShipmentID = s.ShipmentID) TotalShipmentMass
FROM Shipments s;

I’m writing it with a subquery instead of a join to keep it as similar as possible to the version with the function. It should be the same as if I had used a join though. That query takes, on average, 200ms, with 145ms CPU time.

There are 26240 rows in the Shipments table, and on average 34 detail rows per shipment. The function is:

CREATE FUNCTION dbo.ShipmentMass(@ShipmentID INT)
RETURNS NUMERIC(10,2)
AS
BEGIN
    DECLARE @ShipmentMass NUMERIC(10,2);
    SELECT @ShipmentMass = SUM(Mass) FROM ShipmentDetails sd WHERE sd.ShipmentID = @ShipmentID;

    RETURN @ShipmentMass;

END

And the results are:

Compat ModeDuration (ms)CPU (ms)
140961 211 (16 minutes)959 547
15032803272

The test under compat mode 140 had to be run overnight. 9 hours to run the query 25 times… And people wonder why I complain about scalar user-defined functions in systems.

Under compat mode 150 with the inline function it’s way better (3 seconds vs 16 minutes for a single execution), but it’s still over an order of magnitude slower than the same query with the subquery. I’ll test this again after RTM, but for the moment it look like my guidance for functions for SQL 2019 going forward is going to be that scalar functions that don’t access data are fine, but scalar functions that do should still be replaced by inline table-valued functions or no function at all, wherever possible.

No, this is not a bug in T-SQL

(or, Column scope and binding order in subqueries)

I keep seeing this in all sorts of places. People getting an unexpected result when working with a subquery, typically an IN subquery, and assuming that they’ve found a bug in SQL Server.

It’s a bug alright, in that developer’s code though.

Let’s see if anyone can spot the mistake.

We’ll start with a table of orders.

CREATE TABLE Orders (
  OrderID INT IDENTITY PRIMARY KEY,
  ClientID INT,
  OrderNumber VARCHAR(20)
)

There would be more to it in a real system, but this will do for a demo. We’re doing some archiving of old orders, of inactive clients. The IDs of those inactive clients have been put into a temp table

CREATE TABLE #TempClients (
ClientD INT
);

And, to check before running the actual delete, we run the following:

SELECT * FROM dbo.Orders
WHERE ClientID IN (SELECT ClientID FROM #TempClients)

And it returns the entire Orders table. The IN appears to have been completely ignored. At least the query was checked before doing the delete, that’s saved an unpleasant conversation with the DBA if nothing else.

Anyone spotted the mistake yet?

It’s a fairly simple one, not easy to see in passing, but if I test the subquery alone it should become obvious.

The column name in the temp table is missing an I, probably just a typo, but it has some rather pronounced effects.

The obvious next question is why the select with the subquery in it didn’t fail, after all, the query asks for ClientID from #TempClients, and there’s no such column. However there is a ClientID column available in that query, and it’s in the Orders table. And that’s a valid column for the subquery, because column binding order, when we have subqueries, is first to tables within the subquery, and then, if no match is found, to tables in the outer query.

It has to work this way, otherwise correlated subqueries would not be possible. For example:

SELECT c.LegalName,
c.HypernetAddress
FROM dbo.Clients AS c
WHERE EXISTS (SELECT 1 FROM dbo.Shipments s WHERE s.HasLivestock = 1 AND c.ClientID = s.ClientID)

In that example, c.ClientID explicitly references the Client table in the outer query. If I left off the c., the column would be bound to the ClientID column in the Shipments table.

Going back to our original example…

SELECT * FROM dbo.Orders
WHERE ClientID IN (SELECT ClientID FROM #TempClients)

When the query is parsed and bound, the ClientID column mentioned in the subquery does not match any column from any table within the subquery, and hence it’s checked against tables in the outer query, and it does match a column in the orders table. Hence the query essentially becomes

SELECT * FROM dbo.Orders
WHERE ClientID IN (SELECT dbo.Orders.ClientID FROM #TempClients)

Which is essentially equivalent to

SELECT * FROM dbo.Orders
WHERE 1=1

This is one reason why all columns should always, always, always, be qualified with their tables (or table aliases), especially when there are subqueries involved, as doing so would have completely prevented this problem.

SELECT * FROM dbo.Orders o
WHERE o.ClientID IN (SELECT tc.ClientID FROM #TempClients tc)

With the column in the subquery only allowed to be bound to columns within the #TempClients table, the query throws the expected column not found error.

And we’re no longer in danger of deleting everything from the orders table, as we would have if that subquery had been part of a delete and not a select.

Jobs that beat the caring out of you

Ok, Since Jen and Grant started this, it’s time to share some horrors…

This happened during the five or so years I was doing consulting type work with a small consulting company (which was itself a bad idea, but that’s a whole ‘nother story). Work was a tad on the sparse side at the time and I was looking for anything. Enter a logistics company that needed some integration work doing.

SSIS and ETL work for a couple months. Not ideal, but how bad could it be?

Bad. Very bad indeed, and mostly because of management. There’s one thing at least I can say about this place, they taught me how not to manage an IT department.

I’m not going to cover everything that happened at that place, just some aspects of one project. It was a package tracking system, intended to take waybill data, vehicle tracking and some other bits and pieces and make it so that any delivery/shipment could be identified as being in a warehouse or on a truck, and specifically which warehouse or truck.

First problem. It was 6 months of work at least. We had 6 weeks. Not 6 weeks to a deadline that everyone understood was going to be missed. 6 weeks to the date that the company CEO had been told this new tracking system would be in use. Nothing that can possibly go wrong there.

First problem, the project manager. He managed by gantt chart, but that’s not all that uncommon. What was less common was that he appeared to have no concept of time management at all. I worked for them 3 days a week. During one Monday afternoon project meeting, I gave the project manager an estimate of 10 days for chunk of work. I found out later that day that he’d promised it would be in production the following week Thursday. 10 calendar days from the time he was given an estimate, at a point where I’d have had 5 working days to finish it.

That got me yelled at by the head of IT.

Second problem. The project manager and BI specialist (read Excel report writer). They both repeatedly agreed on things in meetings, and then told the head of IT something completely different, something that cast them in a good light and the developers as incompetent idiots. Once is an accident, twice might be coincidence. Three times or more however…

I got into the habit of openly recording the meetings on my phone (for ‘documentation purposes’)

Third problem. The head of IT. I’d say she was a little on the side of micromanaging, but that would be like saying a Joburg thunderstorm is a tad damp. She also had a tendency to overreact, and to listen to only one side of a story before reacting.

The last straw of that particular project was the Monday when the project manager decided that the system we were working on was going into UAT for user testing (not testers, business users). It was not in any way ready and I told him that in the meeting, as did the other developer. After listening to out explanations he agreed and said he’d get another week. It might have been enough.

Next thing I know one of the other devs tells me that the head of IT wanted a word.

No, she didn’t want a word. She wanted to scream at me for over 5 minutes, at the top of her voice, in an open plan office, in front of everyone else about how irresponsible it was to suggest that the project was ready for UAT over the project manager’s recommendations, tell me how incompetent I was, how useless I was, what a terrible developer I was, that I was a liar, lazy, and that she would have me fired and ensure I never got another IT job.

I didn’t walk out. Not quite, but I did call my boss immediately afterwards. See, I didn’t work for her. I was doing the work on contract. She couldn’t fire me.

My boss at the time was the softest spoken person I know, he never raised his voice, never lost his temper, never sounded irritated no matter what. That afternoon, when he had to drop all the other work he had planned and come out to the logistics company, that afternoon I heard him angry.

Somehow the logistics company is still in business. I have no idea how.

Comparing plans in Management Studio

Previously I looked at using Query Store to compare execution plans, but it’s not the only way that two execution plans can be compared. The other method requires a saved execution plan and the Management Studio execution plan viewer.

Let’s start by assuming I have a saved execution plan for a query and I want to compare it to the execution plan that the same query currently has. First step is to run the query with actual execution plan on. Right-click on the execution plan and select ‘Compare Showplan’

ComparePlans2

Pick a saved execution plan. It doesn’t have to be for the same query, but the comparison will be of little use if the two plans being compared are not from the same query.

ComparePlansFindPlan

And then we get the same comparison screen as we saw last time with the comparison via Query Store. Similar portions of the plan are marked by coloured blocks, and the properties window shows which properties differ between the two plans.

ComparePlansDetail2

Comparing plans in Query Store

One feature that was added in the 2016 version of SSMS that hasn’t received a lot of attention, is the ability to compare execution plans.

There’s two ways of doing this, from Query Store and from saved files.

Let’s start with Query Store, and I’m going to use a demo database that I’ve been working on for a few months – Interstellar Transport (IST). I’ve got a stored procedure in there that has a terrible parameter sniffing problem (intentionally). I’m going to run it a few times with one parameter value, then run it a few more times with another parameter value, remove the plan from cache and repeat the executions in the reverse order.

With that done, the query should show up in the ‘Queries with High Variance’ report (SQL 2017)

image

image

The query has the two expected plans, and they are quite different from each other.

Plan1

Plan2

I can click on the points on the graph individually to see the plans, but comparing the plans in that way is difficult and requires that I make notes somewhere else. What I can do instead is select two different points on the graph and chose the ‘compare plans’ option.

image

This brings up a window where the two plans are displayed one above the other, and areas in the plan which are similar are highlighted.

image

Select an operator and pull up the properties, and the properties of the operator from both plans are shown, with the differences highlighted.

image

This isn’t the only way to compare query plans. The next post will show how it can be done without using Query Store at all.

Books of 2018

I set a reading goal of 75 books again in 2018. Fell a little short, only managed 70. All in all I’m not too unhappy about that.

The full details of all the books read are available on Goodreads’s yearly review https://www.goodreads.com/user/year_in_books/2018/19743140.

There are some books that need special mention.

Oathbringer

Oh my! I knew this would be good, and it was. From Kaladin’s fight with depression (In which I saw reflected some of my own struggles over the past decade) to the revelations of Dalinar’s past and of the history of the Radiants. Not to mention the declaration “I am Unity!” Absolutely spectacular.

The only downside is that my paperback copy is over 1200 pages and is too heavy to read comfortably. If there’s a split version as there was for the previous two books, I’ll probably buy them and donate the doorstop to the local library.

Starship’s Mage series

Imagine a world where technology has advanced to the point of kilometer-long starships and massive space stations, but where travel between the stars is only possible with magic. That’s the premise here, and it does make for a very interesting setting.

In the first book, he main character is a just-graduated jump mage looking for a ship to serve on. He finds a ship, and a lot more.

The Lions of Al-Rassan

I’ve been a fan of Guy Gavriel Kay for years, and this is another outstanding work from him. Set in a fantasy version of Spain, it follows characters from three different religions destined to clash and shows how the wars affect them and those around them.

Exquisitely written.

Redemption’s Blade

What happens after the Chosen One has defected the Dark Lord? How does a world torn apart by war settle back into its old ways?

A rant about presentations

My company’s internal conference is in a couple of weeks, so this seems like a good time to have a quick rant about some presentation failings I’ve seen over the last year or so.

If you want to, or are planning to present at a conference (or even just a usergroup), please, please, please pay attention to the following.

Don’t read your presentation

Please don’t read the bullets on your slides one by one. Please also don’t read a speech off your phone. If I wanted to have something read to me, I’d get an audio book.

A presentation should feel dynamic. It is, and should feel like, a live performance.

If you need reminders or cue cards, that’s fine, but put keywords on them, points that need to be discussed, not the entire speech

Watch your font size

This is for the slides but especially for the demos. Font size of 30 is probably the smallest you should be using on slides.

In demos, if I’m sitting in the back row and can’t read the code, there may be a problem. My eyes are not the best though, so that might be a failing on my part. If, however, I’m sitting in the second row and can’t read the code, there’s definitely a problem.

If the conference insists on, or offers time for a tech check, take the opportunity to check your fonts. A tech check isn’t just ‘does my laptop see the projector? Yes, done.’ Walk to the back of the room, go through all the slides, start your demo, walk back to the back of the room. Make sure that everything is clearly visible.

Minimalistic slides

Please don’t put an essay on your slide. Please don’t have fancy animation (unless you’re doing a presentation on animation). Don’t have things that flash, flicker, spin or dance.

It’s distracting, and it probably means your audience is watching your slides and not listening to you. You should be the star of the presentation, not your slides. They’re a support character.

Themes

I like the Visual Studio dark theme. It’s nice to code with, it’s absolutely terrible on a projector. Especially if the room is not dark. For projectors you want strong contrast. Dark font on light background usually works. Dark blue on black does not, two similar shades of blue doesn’t.

Check that your demos are visible, check that the code is readable from the back of the room.

Learn how to zoom in, whether with the windows built in tools or installed apps. Use the zoom any time that what you’re showing may not be clear.

Repeat the question

Especially if the session is being recorded. Your voice is being recorded, the audience isn’t. It is so frustrating to listen to a recorded session, hear a minute of silence followed by the presenter giving a single word answer.

Even if the session is not being recorded, acoustics often make it possible for the presenter to hear a question while part of the audience hasn’t.

It also gives you a chance to confirm that you heard the question correctly and gives you a few moments to think on an answer.