Developers Club geek daily blog

2 years, 7 months ago
Trip to Dnipropetrovsk, chronic sleep debt the last couple of days, but a pleasant bonus on arrival to Kharkiv … A winter weather which motivates on writing something interesting …

For a long time in plans was to tell about "reefs" during the work with XML and XQuery which can lead to tricky problems with performance.

Generally, for those who often uses SQL Server, XQuery and likes to parsit values from XML it is recommended to get acquainted with the following material …

For a start we will generate test XML on which we will make experiments:

USE AdventureWorks2012
GO

IF OBJECT_ID('tempdb.dbo.##temp') IS NOT NULL
    DROP TABLE ##temp
GO

SELECT val = (
    SELECT
          [@obj_id] = o.[object_id]
        , [@obj_name] = o.name
        , [@sch_name] = s.name
        , (
            SELECT i.name, i.column_id, i.user_type_id, i.is_nullable, i.is_identity
            FROM sys.all_columns i
            WHERE i.[object_id] = o.[object_id]
            FOR XML AUTO, TYPE
        )
    FROM sys.all_objects o
    JOIN sys.schemas s ON o.[schema_id] = s.[schema_id]
    WHERE o.[type] IN ('U', 'V')
    FOR XML PATH('obj'), ROOT('objects')
)
INTO ##temp

DECLARE @sql NVARCHAR(4000) = 'bcp "SELECT * FROM ##temp" queryout "D:\sample.xml" -S ' + @@servername + ' -T -w -r -t'
EXEC sys.xp_cmdshell @sql

IF OBJECT_ID('tempdb.dbo.##temp') IS NOT NULL
    DROP TABLE ##temp

For those at whom xp_cmdshell is disconnected it is necessary to execute:

EXEC sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
EXEC sp_configure 'xp_cmdshell', 1
GO
RECONFIGURE
GO

As a result on the specified way we will create the file with such here structure:

<objects>
  <obj obj_id="245575913" obj_name="DatabaseLog" sch_name="dbo">
    <i name="DatabaseLogID" column_id="1" user_type_id="56" is_nullable="0" is_identity="1" />
    <i name="PostTime" column_id="2" user_type_id="61" is_nullable="0" is_identity="0" />
    <i name="DatabaseUser" column_id="3" user_type_id="256" is_nullable="0" is_identity="0" />
    <i name="Event" column_id="4" user_type_id="256" is_nullable="0" is_identity="0" />
    <i name="Schema" column_id="5" user_type_id="256" is_nullable="1" is_identity="0" />
    <i name="Object" column_id="6" user_type_id="256" is_nullable="1" is_identity="0" />
    <i name="TSQL" column_id="7" user_type_id="231" is_nullable="0" is_identity="0" />
    <i name="XmlEvent" column_id="8" user_type_id="241" is_nullable="0" is_identity="0" />
  </obj>
  ...
  <obj obj_id="1237579447" obj_name="Employee" sch_name="HumanResources">
    <i name="BusinessEntityID" column_id="1" user_type_id="56" is_nullable="0" is_identity="0" />
    <i name="NationalIDNumber" column_id="2" user_type_id="231" is_nullable="0" is_identity="0" />
    <i name="LoginID" column_id="3" user_type_id="231" is_nullable="0" is_identity="0" />
    <i name="OrganizationNode" column_id="4" user_type_id="128" is_nullable="1" is_identity="0" />
    <i name="OrganizationLevel" column_id="5" user_type_id="52" is_nullable="1" is_identity="0" />
    <i name="JobTitle" column_id="6" user_type_id="231" is_nullable="0" is_identity="0" />
    <i name="BirthDate" column_id="7" user_type_id="40" is_nullable="0" is_identity="0" />
    ...
  </obj>
  ...
</objects>

Now we will begin intaking experiments …

How it is the most effective to load data from XML? Probably, it is not necessary to open the file a notepad, to copy contents and to insert into a variable … I think that OPENROWSET will be more correct to use:

DECLARE @xml XML
SELECT @xml = BulkColumn
FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x

SELECT @xml

But there is an amusing dirty trick. As it appeared, combination of operations of loading and parsing of values from XML can lead to essential decline in production. Let's allow to us it is necessary to receive obj_id values from earlier created file:

;WITH cte AS 
(
    SELECT x = CAST(BulkColumn AS XML)
    FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x
)
SELECT t.c.value('@obj_id', 'INT')
FROM cte
CROSS APPLY x.nodes('objects/obj') t(c)

By my machine this request is executed very long:

(495 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 20788, ..., lob logical reads 7817781, ..., lob read-ahead reads 1022368.
 SQL Server Execution Times:
   CPU time = 53688 ms,  elapsed time = 53911 ms.

Let's try to separate loading and parsing:

DECLARE @xml XML
SELECT @xml = BulkColumn
FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x

SELECT t.c.value('@obj_id', 'INT')
FROM @xml.nodes('objects/obj') t(c)

Everything fulfilled very quickly:

(1 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 7, ..., lob logical reads 2691, ..., lob read-ahead reads 344.
 SQL Server Execution Times:
   CPU time = 15 ms,  elapsed time = 51 ms.

(495 row(s) affected)
 SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 125 ms.

So in what there was a problem? Let's analyze the execution plan:

XML, XPath and threefold grief with a performance

As it appeared, the problem is covered in type conversion therefore try to give initially to the nodes function parameter in the XML type.

Further we will consider a typical situation when when parsing it is necessary to execute filtering … In such cases it is necessary to remember that SQL Server does not optimize challenges of functions for work with XML.

I will show for descriptive reasons told that in this request the value function will be executed twice:

SELECT t.c.value('@obj_id', 'INT')
FROM @xml.nodes('objects/obj') t(c)
WHERE t.c.value('@obj_id', 'INT') < 0

(404 row(s) affected)
 SQL Server Execution Times:
   CPU time = 116 ms,  elapsed time = 120 ms.

This nuance can reduce performance therefore it is recommended to reduce challenges of functions:

SELECT *
FROM (
    SELECT id = t.c.value('@obj_id', 'INT')
    FROM @xml.nodes('objects/obj') t(c)
) t
WHERE t.id < 0

(404 row(s) affected)
 SQL Server Execution Times:
   CPU time = 62 ms,  elapsed time = 74 ms.

As the option can be filtered still so:

SELECT t.c.value('@obj_id', 'INT')
FROM @xml.nodes('objects/obj[@obj_id < 0]') t(c)

(404 row(s) affected)
 SQL Server Execution Times:
   CPU time = 110 ms,  elapsed time = 119 ms.

but to speak about an essential prize not to have. Though execution plans indicate the opposite a little:

XML, XPath and threefold grief with a performance

It is shown that the third option the most optimum … Let it is one more argument will not trust QueryCost which is only an internal assessment.

And the most interesting example on snack … There is one more VERY important feature when parsing from XML. Let's execute request:

SELECT
      t.c.value('../@obj_name', 'SYSNAME')
    , t.c.value('@name', 'SYSNAME')
FROM @xml.nodes('objects/obj/*') t(c)

also we will look at runtime which can suit only those who do not hurry already anywhere:

(5273 row(s) affected)
 SQL Server Execution Times:
   CPU time = 66578 ms,  elapsed time = 66714 ms.

Why it occurs? SQL Server the server has problems in read operations of parent nodes:

XML, XPath and threefold grief with a performance

How to us in that case to be? Everything is very simple … to begin reading with parent nodes and to read child by means of CROSS/OUTER APPLY:

SELECT
      t.c.value('@obj_name', 'SYSNAME')
    , t2.c2.value('@name', 'SYSNAME')
FROM @xml.nodes('objects/obj') t(c)
CROSS APPLY t.c.nodes('*') t2(c2)

(5273 row(s) affected)
 SQL Server Execution Times:
   CPU time = 156 ms,  elapsed time = 184 ms.

It is still interesting to consider a situation when we need to look at 2 levels above. The problem with reading a parent member at me was not reproduced:

USE AdventureWorks2012
GO

DECLARE @xml XML
SELECT @xml = (
    SELECT
          [@obj_name] = o.name
        , [columns] = (
            SELECT i.name
            FROM sys.all_columns i
            WHERE i.[object_id] = o.[object_id]
            FOR XML AUTO, TYPE
        )
    FROM sys.all_objects o
    WHERE o.[type] IN ('U', 'V')
    FOR XML PATH('obj')
)

SELECT
      t.c.value('../../@obj_name', 'SYSNAME')
    , t.c.value('@name', 'SYSNAME')
FROM @xml.nodes('obj/columns/*') t(c)

Still wanted to mention one interesting feature. Has no problems with reading parent members of OPENXML:

DECLARE
      @xml XML
    , @idoc INT

SELECT @xml = BulkColumn
FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x

EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @xml 

SELECT *
FROM OPENXML(@idoc, '/objects/obj/*') 
WITH (
    name  SYSNAME '../@obj_name', 
    col   SYSNAME '@name'
)

EXEC sys.sp_xml_removedocument @idoc

(5273 row(s) affected)
 SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 137 ms.

But it is not necessary to think now that OPENXML has clear advantages over XQuery. OPENXML have jambs too. For example, if we forget to cause sp_xml_removedocument, then there can be strong memory leaks.

Everything was tested for SQL Server 2012 SP3 (11.00.6020).
Took execution plans from dbForge.

This article is a translation of the original post at habrahabr.ru/post/273189/
If you have any questions regarding the material covered in the article above, please, contact the original author of the post.
If you have any complaints about this article or you want this article to be deleted, please, drop an email here: sysmagazine.com@gmail.com.

We believe that the knowledge, which is available at the most popular Russian IT blog habrahabr.ru, should be accessed by everyone, even though it is poorly translated.
Shared knowledge makes the world better.
Best wishes.

comments powered by Disqus