One SQL Cheat Code For Blazing Fast JSON Queries

Written by bertwagner | Published 2017/06/22
Tech Story Tags: programming | sql-server | computed-columns | json | performance

TLDRvia the TL;DR App

How non-persisted computed column indexes make your JSON queries high performance

Recently I’ve been working with JSON in SQL Server 2016 a lot.

One of the hesitations many people have with using JSON in SQL Server is that they think that querying it must be really slow — SQL is supposed to excel at relational data, not string parsing right?

It turns out that performance is pretty good with the standalone SQL Server JSON functions. Even better is that it’s possible to make queries against JSON data run at ludicrous speeds by using indexes on JSON parsed computed columns. In this post I want to take a look at how SQL is able to parse* with such great performance.

*“Parse” here is actually a lie — it’s doing something else behind the scenes. You’ll see what I mean, keep reading!

Computed Columns in SQL Server

The only way to get JSON indexes working on SQL server is to use a computed column. A computed column is basically a column that performs a function to calculate its values.

For example, let’s say we have a table with some car JSON data in it:

DROP TABLE IF EXISTS dbo.DealerInventory;CREATE TABLE dbo.DealerInventory(Id int IDENTITY(1,1) PRIMARY KEY,Year int,JsonData nvarchar(300));

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Volkswagen", "Model" : "Golf" }');

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Honda", "Model" : "Civic" }');

INSERT INTO dbo.DealerInventory (Year, JsonData) VALUES (2017, '{ "Make" : "Subaru", "Model" : "Impreza" }');

SELECT * FROM dbo.DealerInventory;

/* Output:Id Year JsonData----- -------- ---------------------------------------------1 2017 { "Make" : "Volkswagen", "Model" : "Golf" }2 2017 { "Make" : "Honda", "Model" : "Civic" }3 2017 { "Make" : "Subaru", "Model" : "Impreza" }*/

We can add a new computed column to the table, “Make”, which parses and extracts the Make property from each row’s JSON string:

ALTER TABLE dbo.DealerInventoryADD Make AS JSON_VALUE(JsonData, '$.Make');

SELECT * FROM dbo.DealerInventory;

/* Output:Id Year JsonData Make-- ----- ------------------------------------------- ----------1 2017 { "Make" : "Volkswagen", "Model" : "Golf" } Volkswagen2 2017 { "Make" : "Honda", "Model" : "Civic" } Honda3 2017 { "Make" : "Subaru", "Model" : "Impreza" } Subaru*/

By default, the above Make computed column is non-persisted, meaning its values are never stored to the database (persisted computed columns can also be created, but that’s a topic for a different time). Instead, every time a query runs against our dbo.DealerInventory table, SQL Server will calculate the value for each row.

The performance of this isn’t great — it’s essentially a scalar function running for each row of our output :(. However, when you combine a computed column with an index, something interesting happens.

Time to dive in with DBCC Page

DBCC Page is an undocumented SQL Server function that shows what the raw data stored in a SQL page file looks like. Page files are how SQL Server stores its data.

In the rest of this post we’ll be looking at how data pages (where the actual table data in SQL is stored) and index pages (where our index data is stored) are affected by non-persisted computed columns — and how they make JSON querying super fast.

First, let’s take a look at the existing data we have. We do this by first turning on trace flag 3604 and using DBCC IND to get the page ids of our data. Additional details on the column definitions in DBCC IND and DBCC PAGE can be found in Paul Randal’s blog post on the topic.

DBCC TRACEON(3604);

-- "Sandbox" is the name of my databaseDBCC IND('Sandbox','dbo.DealerInventory',-1);

If you look at the results above, row 2 contains our data page (indicated by PageType = 1) and the PagePID of that page is 305088 (if you are playing along at home, your PagePID is most likely something else). If we then look up that PagePID using DBCC PAGE we get something like this:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS

You can see our three rows of data highlighted in red. The important thing to note here is that our computed column of the parsed “Make” value is truly non-persisted and no where to be found, meaning it has to get generated for every row during query execution.

Now, what if we add an index to our non-persisted computed column and then run DBCC IND again:

CREATE NONCLUSTERED INDEX IX_ParsedMake ON dbo.DealerInventory (Make)

DBCC IND('Sandbox','dbo.DealerInventory',-1);

You’ll now notice that in addition to data page 305088 (PageType = 1), we also have an index page 305096 (PageType = 2). If we examine both the data page and the index page we see something interesting:

DBCC PAGE('Sandbox',1,305088,3) WITH TABLERESULTS

DBCC PAGE('Sandbox',1,305096,3) WITH TABLERESULTS

Nothing has changed with our data page:

But our index page contains the parsed values for our “Make” column:

What does this mean? I thought non-persisted computed columns aren’t saved to disk!

Exactly right: our non-persisted computed column “Make” isn’t saved to the data page on the disk. However if we create an index on our non-persisted computed column, the computed value is persisted on the index page!

This is basically a cheat code for indexing computed columns.

SQL will only compute the “Make” value on a row’s insert or update into the table (or during the initial index creation) — all future retrievals of our computed column will come from the pre-computed index page.

This is how SQL is able to parse indexed JSON properties so fast; instead of needing to do a table scan and parsing the JSON data for each row of our table, SQL Server can go look up the pre-parsed values in the index and return the correct data incredibly fast.

Personally, I think this makes JSON that much easier (and practical) to use in SQL Server 2016. Even though we are storing large JSON strings in our database, we can still index individual properties and return results incredibly fast.

Enjoyed this post? Please recommend it by giving it a green heart 💚 below.

Do you use computed column indexes for any of your tables? Tell me about them in the comments below!


Published by HackerNoon on 2017/06/22