承接上一篇：SQL Server 2008性能故障排查（一）——概论

说明一下，CSDN的博客编辑非常不人性化，我在word里面都排好了版，贴上来就乱得不成样了。建议CSDN改进这部分。也请大家关注内容不要关注排版。同时在翻译的过程中本人也整理了一次思路，所以还似乎非常愿意翻译，虽然有点自娱自乐，但是分享给大家也是件好事

CPU 瓶颈：
CPU瓶颈可能因为某个负载所需的硬件资源不足而引起。但是过多的CPU使用通常可以通过查询优化（特别是在一个服务器上没有增加额外负载或不同查询时CPU突然上升的情况）、寻找应用程序设计方面的问题和优化系统配置来减少。在你急于买更快、更多的CPU之前，先要找出最耗费CPU资源的查询，并且尝试去调整那些查询或者调整设计/系统配置因素。
性能监视器是用于监测和识别CPU是否已经成为了性能瓶颈的其中一个有用的工具。你要查看：Processor:%Processor Time计数器是否很高。如果每个CPU的该值持续在80%以上，那么意味着CPU已经成为瓶颈。
通过SQL Server你也可以通过DMVs检查CPU是否存在瓶颈。如果在请求等待类型中发现有SOS_SCHEDULER_YIELD等待类型或者一个高值的runnable任务都可是提示可运行线程在计划等待中。这意味着处理器上出现了CPU瓶颈。如果你有可用的数据收集器，SQLServer的等待图表可以很容易地在任何时候查看到现有活动中的是否有CPU瓶颈。消耗CPU和SOS_SCHEDULER_YIELD等待都在报表中被纳入CPU等待范畴。当你看到CPU在高比例使用时，你可以深入查找那些耗资源最多的查询。
下面的查询为你提供一个高级视图去查找目前缓存批处理或者存储过程中使用做多CPU资源的查询。这个查询聚合计算所有执行相同计划句柄（Plan handle意味着他们来自相同批处理或者存储过程）CPU消耗。如果计划句柄超过一个语句，你就必须去深入分析以便找到在整个CPU中耗费最大的那个特定查询：

select top 50 
    sum(qs.total_worker_time) as total_cpu_time, 
    sum(qs.execution_count) as total_execution_count,
    count(*) as  number_of_statements, 
    qs.plan_handle 
from 
    sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc

这节剩余部分将讨论一下通过SQL Server和其他一些有效的方法来增强CPU以解决这些问题。

过度的查询编译和优化：
查询编译和优化是一个高CPU消耗的过程。开销根据查询的复杂度和优先计划增加而增加。但是即使一个简单的查询也会耗用10-20毫秒的CPUT时间去解析和编译。
为了检查这种开销，SQLServer缓存并重用经过编译的查询计划。每次接收到来自客户端的查询时，SQLServer首先回去查找计划缓存，是否已经存在一个可以重复使用的编译计划。如果找不到适合的计划，SQLServer将对查询进行分合和编译。然后再执行。
• 对于OLTP系统。批处理的提交相对来说是小而固定的。一般情况下最佳的执行计划不会依赖于某些值或者作为谓词的值，因为查询是基于键值的。重用执行计划在这种类型的系统中非常重要，因为编译的开销往往接近甚至高于直接运行的开销。但是，对于一个数据仓库负载将能从专用SQL和允许查询优化器根据不同的值选择不同的执行计划中得到好处。因为运行这些查询的时间通常比编译时间要大得多。并且查询优化计划经常根据查询谓词而改变。使用参数化查询或者存储过程对OLTP系统能充分重用已经编译的执行计划。从而降低SQLServer对CPU的耗用。你可以在数据库或者查询级别上使用PARAMETERIZATION FORCED数据库选项或者查询提示来实现参数化。更多关于该特性使用的限制，比如当你依赖于计算列的索引或者索引视图等，请查看联机丛书。但是，参数化最好的使用地方还是在应用程序自己内部。同时能通过参数化减少被SQL注入的机会。相关只是可以查看联机丛书部分：
• SQL Injection (http://msdn.microsoft.com/en-us/library/ms161953.aspx)
• Using sp_executesql (http://msdn.microsoft.com/en-us/library/ms175170.aspx)

侦测问题：
在编译过程中，SQLServer2008计算查询的“签名”并作为sys.dm_exec_requests和sys.dm_exec_query_stats动态视图中的query_hash列的信息展示。这个queryhash属性在showplan/statistics xml实体中对具有相同query_hash值的高可能行设计相同查询文本，如果它被写成一个query_hash参数化形式。查询中仅是字面值不同但拥有相同的值。举个例子：有两个共享相同query hash的查询，当第三个查询有不同的query hash时。由于它执行不同的操作：

select * from sys.objects where object_id = 100
select * from sys.objects where object_id = 101
select * from sys.objects where name = 'sysobjects'

query hash将在编译阶段从产生结构中被重新计算。空格将被忽略，就像在SELECT 中，显式指定列和使用*号的query hash是不一样的。此外，在一个查询中使用完全限定名和另外一个使用全表名作为前缀的查询被认为是同一个查询，具有相同的query_hash：

Use AdventureWorks
Go

set showplan_xml on
go

-- Assume this is run by a user whose default schema is Sales
select * from SalesOrderHeader h

select * from Sales.SalesOrderHeader h

select SalesOrderID,
	RevisionNumber,
	OrderDate,
	DueDate,
	ShipDate,
	Status,
	OnlineOrderFlag,
	SalesOrderNumber,
	PurchaseOrderNumber,
	AccountNumber,
	CustomerID,
	ContactID,
	SalesPersonID,
	TerritoryID,
	BillToAddressID,
	ShipToAddressID,
	ShipMethodID,
	CreditCardID,
	CreditCardApprovalCode,
	CurrencyRateID,
	SubTotal,
	TaxAmt,
	Freight,
	TotalDue,
	Comment,
	rowguid,
	ModifiedDate
from Sales.SalesOrderHeader h
go
set showplan_xml off
go

注意当query_hash值产生时，数据库部分的完全限定名被忽略。这允许在相同实例的很多数据库中执行查询而不至于因为指定了数据库前缀使得查询不成功。
一个简单的识别应用程序提交的特殊查询是查看sys.dm_exec_query_stats.query_hash列：

select q.query_hash, 
	q.number_of_entries, 
	t.text as sample_query, 
	p.query_plan as sample_plan
from (select top 20 query_hash, 
			count(*) as number_of_entries, 
			min(sql_handle) as sample_sql_handle, 
			min(plan_handle) as sample_plan_handle
		from sys.dm_exec_query_stats
		group by query_hash
		having count(*) > 1
		order by count(*) desc) as q
	cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
	cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go

查询结果中额number_of_entries值在千百级别意味着参数化很优秀。如果你在XML执行计划的<QueryPlan>标签中查看CompileTime和CompileCPU属性值是number_of_entries值的翻倍，你应该预测到多少编译时间和CPU用于参数化查询（意味着查询只编译了一次但是被重用了多次）。去除不必要的缓存计划还有一个难懂的好处。释放内存中的缓存用于其他编译计划，并且留下更多内存给缓存。

解决方案：
SQLServer2008同时也在执行计划的访问路径（即join算法、join顺序、索引选择等等）产生一个名为“签名”的query_plan_hash值。某些应用程序通过判断传入查询的参数来评估优化器给出的不同执行计划。在这种情况下如果你不想使得查询参数化，你可以使用query_hash和query_plan_hash值共同决定一个具有相同query_hash值的特定的查询结果是否拥有相同或不同的query_plan_hash值，或者访问路径。在之前提到的查询中稍作修改：

select q.query_hash, 
	q.number_of_entries, 
	q.distinct_plans,
	t.text as sample_query, 
	p.query_plan as sample_plan
from (select top 20 query_hash, 
			count(*) as number_of_entries, 
			count(distinct query_plan_hash) as distinct_plans,
			min(sql_handle) as sample_sql_handle, 
			min(plan_handle) as sample_plan_handle
		from sys.dm_exec_query_stats
		group by query_hash
		having count(*) > 1
		order by count(*) desc) as q
	cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
	cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go

注意本查询根据给定的query_hash值返回一个唯一的查询计划（query_plan_hash值）。即使唯一的计划数量超过1个，你也可以使用sys.dm_exec_query_plan去检索不同的执行计划和检验它们对实现性能优化是否不同。
当你决定哪个查询需要被参数化时，参数化的发生地方最好是在客户端应用程序。具体的实现方法受到你所选择的API。但是有一件不变的事实就是所有API都用于替代创建文本化谓词，你可以创建一个带有问号（？）的字符串作为参数占位符。

-- Submitting as ad hoc query
select * from Sales.SalesOrderHeader where SalesOrderID = 100

-- Submitting as parameterized
select * from Sales.SalesOrderHeader where SalesOrderID = ?

你应该为你绑定参数值使用一个合适的APIs（ODBC,OLE DB,或者SQL Client）。客户端使用sp_executesql来执行参数化：
exec sp_executesql N’select * from Sales.SalesOrderHeader where SalesOrderID = @P1’, N’@P1 int’, 100
由于查询是参数化，能重用现有的缓存计划。如果希望整个数据库合理地参数化，并且你不需要控制或者修改客户端应用程序，你同样可以使用强制数据库参数来实现。注意前面的警告，这会阻止优化器选择有效的索引视图和索引：会阻止优化器选择有效的索引视图和索引：会阻止优化器选择有效的索引视图和索引：
ALTER DATABASE AdventureWorks SET PARAMETERIZATION FORCED
如果你不能在客户端应用程序参数化或者强制整个数据库参数化，你依然能对某些查询使用临时的计划暗示，使用OPTION（PARAMETERIZATION FORCED）。

非必要的重编译：
当一个批处理或者一个远程存储调用（RPC）被提交到SQLServer时，服务器会在开始执行钱检查查询计划的有效性和正确性。如果这些检查不通过，批处理就必须重新编译并产生一个不同的执行计划。这种编译叫“重编译”。重编译很多时候是必须的，用于确保正确性或者当服务器觉得底层数据更改时重编译更加有效。编译是一件耗费CPU资源的动作，因此过度的重编译结果可能是影响系统的CPU性能。
在SQLServer2000中，当SQLServer重编译一个存储过程时，整个存储过程都会被重编译。在SQLServer2005、2008中，存储过程的重编译是语句级别的。当2005、2008重编译存储过程时。只重编译需要编译的语句，而不是整个存储过程编译。这样能减少CPU频宽和更少的锁资源（比如COMPILE锁）。重编译一般出现在以下情况：

架构变更。
统计信息变更
延迟编译
SET选项更改
临时表变更
在存储过程创建时使用了RECOMPILE提示。

探测：
可以使用性能监视器和SQLServer Profiler去探测过多的编译和重编译：
性能监视器（Performance Monitor）：
SQL Statistics对象提供编译监视计数器和SQLServer实例被发送请求的类型。你需要监控批处理的编译和重编译数来发现哪些制造了高CPU使用率。理想情况下，SQL Recompilations/sec和Batch Requests/sec都应该很低。除非用户提交了特殊查询。
主要计数器：
SQL Server:SQL Statistics:Batch Requests/sec
SQL Server:SQL Statistics:SQL Compilations/sec
SQL Server:SQL Statistics:SQL Recompilations/sec
详细信息请参考联机丛书中的SQL Statistics Object
SQL Server Profiler Trace：
如果性能监视器暗示重编译次数过高，重编译就有可能成为SQLServer耗费CPU资源的罪魁祸首。可以通过Profiler来跟踪存储过程重编译情况。Profiler会连同重编译的原因也追踪出来：
SP:Recompile/SQL:StmtRecompile：前者是存储过程，后者是语句的编译。当你编译一个存储过程时，会产生这个存储过程及其每个语句的相应事件。但是当存储过程重编译时，只有引起重编译的语句会被重编译。更重要的数据列是SP:Recompile事件。EventSubClass数据列对于发现重编译的原因是非常重要的。SP:Recompile会在存储过程中触发一次或者重编译时触发一次，并且不对不重编译操作的特殊批处理不触发。在05、08中，更加游泳的是SQL:StmtRecomile。因为这个计数器会在所有重编译时触发。重要的事件列：

EventClass
EvnetSubClass
ObjectID(包含该语句的存储过程ID)
SPID
StartTime
SqlHandle
TextData

更多信息请查看联机丛书。
如果你已经有了一个跟踪文件，可以使用下面语句查看所有重编译事件：

select 
    spid,
    StartTime,
    Textdata,
    EventSubclass,
    ObjectID,
    DatabaseID,
    SQLHandle 
from 
    fn_trace_gettable ( 'e:\recompiletrace.trc' , 1)
where 
    EventClass in(37,75,166)

其中：EventClass 37 = Sp:Recompile, 75 = CursorRecompile, 166 = SQL:StmtRecompile
你也可以更进一步组织结果，把sqlhandle和ObjectID列分组或者更多的列。要查看是否存储过程有过多的重编译或者因为某种原因重编译（比如SET选项）
Showplan XML For Query Compile：
该事件会在SQLServer编译或者重编译一个T-SQL语句时触发。这个事件是关于语句编译或重编译的信息。包括执行计划、ObjectID。获取这个时间对总体性能是有意义的，因为它能捕获每一个编译或重编译。如果你在SQL Compilations/sec指针长期处于高值。就要注意这个事件了。可以通过这个时间查看什么语句经常重编译。然后通过改变语句的参数来降低重编译次数。
DMVs：
当你使用sys.dm_exec_query_optimizer_info DMV是，可以看到SQLServer在优化时间方面的一些好建议。如果你执行该DMV两次，你将感受到在给定时间内用于优化的时间：
select * from sys.dm_exec_query_optimizer_info
counter occurrence value
---------------- -------------------- ---------------------
optimizations 81 1.0
elapsed time 81 6.4547820702944486E-2
要重点关注elapsed time，因为当优化时，这个时间通常接近用于优化的CPU时间。因为优化处理是非常耗费CPU资源的。你可以看到那些编译和重编译操作时最好CPU资源的。
另外一个有用的DMV是：sys.dm_exec_query_stats，主要关注列有：
• Sql_handle
• Total worker time
• Plan generation number
• Statement Start Offset
特别关注Plan_generation_num，因为是查询被重编译的次数。下面是一个查询前25个重编译最多的存储过程信息：

select * from sys.dm_exec_query_optimizer_info

select top 25
    sql_text.text,
    sql_handle,
    plan_generation_num,
    execution_count,
    dbid,
    objectid 
from 
    sys.dm_exec_query_stats a
    cross apply sys.dm_exec_sql_text(sql_handle) as sql_text
where 
    plan_generation_num >1
order by plan_generation_num desc

解决方案：
如果你发现了耗时的编译和重编译，可以考虑以下步骤：
 如果重编译的发生原因是因为SET选项，那么用SQLServer Profiler来抓去那个SET选项改变了，然后把它禁用掉（程序级别，不是数据库级别）。最好是把这个SET操作放到数据库连接级别，因为这能保证在连接的生命周期中有效。
 临时表的重编译临界值比实体表要低，如果因为临时表的统计信息改变而造成重编译，可以把临时表改成表变量。表变量在基数上的更改不引发重编译。但是这种方法的缺点是查询优化器不保持跟踪表变量的基数，因为表变量不产生统计信息和维护这些信息。这回导致执行计划性能优化方面的不完整。你可以测试各种方法然后选择最好的一种。
 另外一种选项是使用KEEP PLAN查询暗示（query hint）。这种设置使的临时表和实体表具有相同临界值。EventSubClass列显示在临时表上一个操作的统计信息改变情况。
 为了避免统计信息的改变而重编译（比如，如果执行计划因为数据的统计信息改变而变成次优级别时）。可以特别执行query hint。只当这个执行计划需要检查正确性时才发生重编译（比如底层结构更改或者执行计划长时间没有使用）。并且不依赖统计信息的改变。比如表架构的更改、或者使用了sp_recompile存储过程去标注表。
 关闭了自动更新统计信息能防止重编译。因为统计信息的改变也会导致重编译的发生。但是注意，关闭自动更新不是一个好的建议，因为查询优化器不在对对象上的数据更改敏感，会导致执行计划不是最优化。只是在使用了所有其他改变都无效时的不得已手段。
 数据库中的对象应该使用完全限定名，如dbo.table1，这样避免重编译和避免引起对象歧义。
 为了避免重编译，可以延迟编译，不要交错使用DML和DDL或者使用有条件的DDL比如IF语句。
 运行DTA看看是否可以通过调整索引来改善编译时间和执行时间。
 检查存储过程是否有WITH RECOMPILE的选项。或者是否在创建存储过程时有WITH RECOMPILE选项。在2005、2008中如果有需要，可以在语句级别加上RECOMPILE提示。在语句级别使用这个提示可以避免整个存储过程重编译。

低效查询计划：
当查询产生一个执行计划时，sqlser优化器会尝试选择响应时间最短的计划。最快响应时间并不一定意味着最小化I/O。或者最小化CPU，而是各种资源上的一个平衡。
某些操作与生俱来就比普通操作更占用CPU资源。如Hash操作和排序操作会分别扫描各自的输入数据。如果在类似扫描中【预读】，那么在缓存中的所需页面几乎总之可用的。因此，物理I/O的等待将最小化或者完全消失。如果这些类型的操作不再受制于物理I / O时,他们倾向于出现在高CPU的消耗。相比之下，具有很多索引查找的嵌套循环关联如果需要跨越表的很多部分才能找到合适的数据，那么会很快成为I/O瓶颈。
查询优化器会对最有意义的输入，使用基于成本的各种方法来评估各种基数操作，以便找出最有效的查询计划。（即执行计划中的EstimateRows和EstimateExecutions属性）。没有明确的基数评估，那么执行计划是有缺陷的，并且往往这个却是最终方案。
关于描述SQLServer优化器所使用的统计信息，请查阅：Statistics Used by the Query Optimizer in Microsoft SQL Server 2005

侦查：
低效的执行计划相对来说比较容易发现，一个低效的执行计划能引起CPU的消耗上升。下面的查询是有效标识高消耗CPU的方法：

select 
    highest_cpu_queries.plan_handle, 
    highest_cpu_queries.total_worker_time,
    q.dbid,
    q.objectid,
    q.number,
    q.encrypted,
    q.[text]
from 
    (select top 50 
        qs.plan_handle, 
        qs.total_worker_time
    from 
        sys.dm_exec_query_stats qs
    order by qs.total_worker_time desc) as highest_cpu_queries
    cross apply sys.dm_exec_sql_text(plan_handle) as q
order by highest_cpu_queries.total_worker_time desc

另外，也可以使用sys.dm_exec_cached_plans来过滤可能引起CPU高消耗的各种操作，比如：’%Hash Match%’、’%Sort%’等。

解决方案：
如果你发现有不高效的执行计划，可以尝试以下方法：
 使用DTA优化程序，检查是否有索引问题。
 检查坏基数评估
 检查语句的where子句是否可以更好地筛选数据。没有严格限定的查询天生就很耗资源。
 在表上运行UPDATE STATISTICS检查是否还存在问题。
 检查是否使用了构造器使得优化器不能准确评估基数？考虑是否可以改写查询来避免这种情况？
 如果无法修改架构或者查询，你可以在查询中使用查询提示（query hints）来限定执行计划。计划向导同时也对存储过程中创建的特殊查询有效。Hints能脱离优化器对查询计划的潜在更改。
 SQLServer 2008 同样提供一个叫做【计划冻结】的新特性。允许你冻结在执行计划缓存中存在的特定计划。这个选项类似于在执行计划想到中使用USE PLAN查询提示来定义计划。但是，它消除了使用很长的命令来创建执行计划向导。并且能最小化在长于据中出现的用户错误。示例如下：

DECLARE @plan_handle varbinary(64);

-- Extract the query's plan_handle.
SELECT @plan_handle = plan_handle FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
WHERE text LIKE N'Some query matching criteria%';

EXECUTE sp_create_plan_guide_from_handle
    @name =  N'Sample_PG1',
    @plan_handle = @plan_handle,
    @statement_start_offset = NULL;
GO

查询内并行:
当生成了执行计划以后，SQLServer优化器会尝试选择响应时间最短的计划来实现查询。如果查询消耗超过了并行度花销临界值，查询优化器会尝试生成一个计划来做并行运行。并行执行计划使用处理器的多个线程来处理查询。最大并行度可以在服务器级别做限制，使用max degree of parallelism选项。在资源负载组级别或者每个查询级别使用OPTION(MAXDOP)提示。实际执行的并行度（实际用于并行操作的标准）被延迟到执行时才实现。在执行之前，SQLServer会根据空闲程度决定可用于执行DOP的可用调度器。在DOP被选定以后，查询会在该并行度中执行直到结束。并行执行会稍微耗用更多的CPU时间，但只是持续一段很短的时间。当没有其他瓶颈时，比如物理I/O等待等，并行计划会完全使用100%的CPU。
一个关键的因素是在查询开始执行后，引导查询使用并行计划。但是这种情况在执行开始后还是可以变更的。比如，如果查询在空闲发起，服务器可能会选择使用使用4个处理器来并行执行。在这些线程开始执行,现有的连接可以提交其他查询这也需要大量的CPU。在那时，其他不同的线程会共享可用CPU的短时间片。这将导致高查询持续时间。
使用并行计划运行不是一件坏事，因为它能提供更快的响应时间。但是，对给定查询的响应时间需要权衡，应该从整体去权衡。并行查询适合批处理，并且会根据负载来选择运行环境。SQLServer2008对分区表查询具有很好的扩展性，所以SQLServer2008会在并行运行时使用比旧版本更高的CPU数量。如果这不是你想要的，就应该限制甚至禁用并行性。

侦查：
并行查询所带来的影响可以使用以下方法来侦测：
性能监视器（Performance Monitor）：
重点关注：SQL Server:SQL Statistics – Batch Requests/sec计数器和SQL Statistics Objects。由于执行并行查询需要预估开销不能超过设定的阈值（默认为5，可以在配置里面设定），所以如果每秒服务器要处理的批处理过多，那么将只有少量的批处理以并行方式执行。服务器通常情况下会并行运行一些小批处理，比如100个以下。
DMVs：
从服务器运行开始，可以使用以下查询来监控：

select 
    r.session_id,
    r.request_id,
    max(isnull(exec_context_id, 0)) as number_of_workers,
    r.sql_handle,
    r.statement_start_offset,
    r.statement_end_offset,
    r.plan_handle
from 
    sys.dm_exec_requests r
    join sys.dm_os_tasks t on r.session_id = t.session_id
    join sys.dm_exec_sessions s on r.session_id = s.session_id
where 
    s.is_user_process = 0x1
group by 
    r.session_id, r.request_id, 
    r.sql_handle, r.plan_handle, 
    r.statement_start_offset, r.statement_end_offset
having max(isnull(exec_context_id, 0)) > 0

在结果中，可以使用sys.dm_exec_sql_text轻易找出查询内容，并使用sys.dm_exec_cached_plan来查看执行计划。
同时可以查找正在并行运行的执行计划，可以查看是否有Parallel操作符且其属性非0的执行计划。这些计划可能不是以并行方式运行，但是他们如果在系统不是非常繁忙的时候，还是能按并行方式运行

--
-- Find query plans that can run in parallel
--
select 
    p.*, 
    q.*,
    cp.plan_handle
from 
    sys.dm_exec_cached_plans cp
    cross apply sys.dm_exec_query_plan(cp.plan_handle) p
    cross apply sys.dm_exec_sql_text(cp.plan_handle) as q
where 
    cp.cacheobjtype = 'Compiled Plan' and
    p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
        max(//p:RelOp/@Parallel)', 'float') > 0

一般情况下，一个查询的持续时间会比CPU时间更长，因为其中一些时间是花费在等待别的资源如锁或者物理I/O上。但在一种情况下查询的花费CPU时间会比持续时间更长，就是当查询以并行方式运行在目前可用的多个线程上。但是注意不是所有的并行查询都会以这种方式运行。

select 
    qs.sql_handle, 
    qs.statement_start_offset, 
    qs.statement_end_offset, 
    q.dbid,
    q.objectid,
    q.number,
    q.encrypted,
    q.text
from 
    sys.dm_exec_query_stats qs
    cross apply sys.dm_exec_sql_text(qs.plan_handle) as q
where 
    qs.total_worker_time > qs.total_elapsed_time
SQL Trace
Look for the following signs of parallel queries, which could be either 
statements or batches that have CPU time greater than the duration.

select 
    EventClass, 
    TextData 
from 
    ::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where 
    EventClass in (10, 12)	-- RPC:Completed, SQL:BatchCompleted
    and CPU > Duration/1000	-- CPU is in milliseconds, Duration in 
microseconds oOr can be Showplans (un-encoded) that have Parallelism 
operators in them
select 
    EventClass, 
    TextData 
from 
    ::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where 
    TextData LIKE '%Parallelism%'

解决方法：
 以并行计划运行的查询，优化器会确认是否预计花销超过了默认阈值（5）。一些查询会被标记，作为候选优化方案。
 使用DTA查看是否有索引需要修改。改动索引能减少查询开销。
 检查预估执行计划和实际执行计划，因为预估是基于统计信息，而且基于成本。如果下面的情况出现了，就要检查问题：

o 是否禁用了自动开启统计，确保在执行计划的警告部分不会出现丢失状态等信息

o 如果预估计划被禁用，那么在对应表上运行：UPDATE STATISTICS

o 确保查询没有使用到优化器无法明确预估的构造器，如多语句的表值函数或者CLR函数、表变量或者带有比较功能的T-SQL函数（带有参数的比较是允许的）。

o 评估语句是否能用更有效更新的方式去实现。

低效的游标使用：
在2005之前的版本只支持单独的活动连接。正在被查询或者等待被发送到客户端的查询被认为是活动的。在某些情况下，客户端可能接收结果后，会把结果里面的语句返回SQLServer再继续执行，这样会引起新一轮的挂起。一个比较好的解决办法是改变连接属性，在服务器端使用游标。
当使用服务器端的游标时，数据库客户端软件（OLE DB或者ODBC）会透明地压缩客户端请求到一个特殊的扩展存储过程，如sp_cursoropen /sp_cursoffetch。这是引用一个API游标（于T-SQL游标相反）。当用户执行查询时，查询文本将通过sp_cursoropen被发送到服务器。请求将从sp_cursorfetch通过服务器返回的特定行中读取数据。为了控制数据行的返回，ODBC或者OLE DB可以使用行缓存。这避免了服务器等待客户端读取它返回的所有数据，这样，服务器就可以准备接收该连接的新请求。
应用程序打开游标开始读取时，容易由于网络延迟造成瓶颈。特别在广域网。在多用户连接的快速网络，从总体上进程请求很多游标带来的性能问题也会变得很明显。因为总体响应时间会因为游标定位每个结果集、预处理和类似处理。

侦测：
你可以使用下面工具来排查低效的游标应用：
性能监视器：
查看SQL Server: Cursor Manager By Type-Cursor Requests/Sec计数器。可以感受到大概游标使用情况。系统可能因为小型但每秒有几百个提取操作而导致存在高CPU利用。没有专门的计数器列举提取所使用的缓存大小。
DMVs：
可以使用下面的DMV来查询每个连接的API游标提取的缓存大小。

select 
    cur.* 
from 
    sys.dm_exec_connections con
    cross apply sys.dm_exec_cursors(con.session_id) as cur
where
    cur.fetch_buffer_size = 1 
    and cur.properties LIKE 'API%'	-- API cursor (Transact-SQL cursors 
always 
have a fetch buffer of 1)

SQL Trace：
使用包含RPC:Completed事件去查找sp_cursorfetch语句。第四个参数的只是提取操作所返回的行数。具体大小根据输出而定。可以看相应的RPC:Starting事件。

解决方法：
 检查是否可以使用基于集合的操作来替代游标，因为这样几乎一定比游标高效。
 考虑在连接SQLServer2008是，开启多活动结果（MARS）
 查阅你使用的API文档。决定如何定义一个对于游标的大的提取缓存：
o ODBC-SQL_ATTR_ROW_ARRAY_SIZE
o OLE DB –Irowset::GetNextRows 或者IRowsetLocate::GetRowsAt
下一章： I/O瓶颈

原文：

CPU Bottlenecks
A CPU bottleneck can be caused by hardware resources that are insufficient for the load. However, excessive CPU utilization can commonly be reduced by query tuning (especially if there was a sudden increase without additional load or different queries on the server), addressing any application design factors, and optimizing the system configuration. Before you rush out to buy faster and/or more processors, identify the largest consumers of CPU bandwidth and see whether you can tune those queries or adjust the design/configuration factors.
Performance Monitor is generally one of the easiest means to determine whether the server is CPU bound. You should look to see whether the Processor:% Processor Time counter is high; sustained values in excess of 80% of the processor time per CPU are generally deemed to be a bottleneck.
From within SQL Server, you can also check for CPU bottlenecks by checking the DMVs. Requests waiting with the SOS_SCHEDULER_YIELD wait type or a high number of runnable tasks can indicate that runnable threads are waiting to be scheduled and that there might be a CPU bottleneck on the processor. If you have enabled the data collector, the SQL Server Waits chart on the Server Activity report is a very easy way to monitor for CPU bottlenecks over time. Consumed CPU and SOS_SCHEDULER_YIELD waits are rolled up into the CPU Wait Category in this report, and if you do see high CPU utilization, you can drill through to find the queries that are consuming the most resources.

The following query gives you a high-level view of which currently cached batches or procedures are using the most CPU. The query aggregates the CPU consumed by all statements with the same plan_handle (meaning that they are part of the same batch or procedure). If a given plan_handle has more than one statement, you may have to drill in further to find the specific query that is the largest contributor to the overall CPU usage.

select top 50
sum(qs.total_worker_time) as total_cpu_time,
sum(qs.execution_count) as total_execution_count,
count(*) as number_of_statements,
qs.plan_handle
from
sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc

The remainder of this section discusses some common CPU-intensive operations that can occur with SQL Server, as well as efficient methods for detecting and resolving these problems.
Excessive Query Compilation and Optimization
Query compilation and optimization is a CPU-intensive process. The cost of optimization increases as the complexity of the query and the underlying schema increases, but even a relatively simply query can take 10-20 milliseconds of CPU time to parse and compile.
To mitigate this cost, SQL Server caches and reuses compiled query plans. Each time a new query is received from the client, SQL Server first searches the plan cache (sometimes referred to as the procedure cache) to see whether there is already a compiled plan that can be reused. If a matching query plan cannot be found, SQL Server parses and compiles the incoming query before running it.
For an OLTP-type workload, the set of queries that are submitted is relatively small and static. Quite commonly the optimal query plan does not depend on the exact value or values used as predicates in the query because the lookups are based on keys. Reusing query plans in this type of workload is very important because the cost of compilation may be as high as or higher than the cost of executing the query itself. However, a data-warehousing workload may benefit greatly from using ad hoc SQL and letting the query optimizer search for the optimal plan for each set of values, because the run time for these queries is typically much longer than the compile time, and the optimal query plan is more likely to change depending on the predicates in the query. Using parameterized queries or stored procedures for OLTP-based applications substantially increases the chance of reusing a cached plan and can result in substantial reductions in SQL Server CPU consumption. You can enable parameterization at the database or query level by using the PARAMETERIZATION FORCED database option or query hint, respectively. For more information about important limitations, especially if you rely on indexes on computed columns or indexed views, see SQL Server 2008 Books Online.
However, the best place to parameterize queries is within the application itself (at design time), which also helps mitigate the risk of SQL injection by avoiding string concatenation using parameter values. For more information, see the following topics in SQL Server 2008 Books Online:
• SQL Injection (http://msdn.microsoft.com/en-us/library/ms161953.aspx)
• Using sp_executesql (http://msdn.microsoft.com/en-us/library/ms175170.aspx)
Detection
During compilation, SQL Server 2008 computes a “signature” of the query and exposes this as the query_hash column in sys.dm_exec_requests and sys.dm_exec_query_stats, and the QueryHash attribute in Showplan/Statistics XML. Entities with the same query_hash value have a high probability of referring to the same query text if it had been written in a query_hash parameterized form. Queries that vary only in literal values should have the same value. For example, the first two queries share the same query hash, while the third query has a different query hash, because it is performing a different operation.

select * from sys.objects where object_id = 100
select * from sys.objects where object_id = 101
select * from sys.objects where name = 'sysobjects'

The query hash is computed from the tree structure produced during compilation. Whitespace is ignored, as are differences in the use of explicit column lists compared to using an asterisk (*) in the SELECT list. Furthermore, it does not matter if one query uses fully qualified name and another uses just the table name as long as they both refer to the same object. All of the following should produce the same query_hash value.

Use AdventureWorks
Go

set showplan_xml on
go

-- Assume this is run by a user whose default schema is Sales
select * from SalesOrderHeader h

select * from Sales.SalesOrderHeader h

select SalesOrderID,
RevisionNumber,
OrderDate,
DueDate,
ShipDate,
Status,
OnlineOrderFlag,
SalesOrderNumber,
PurchaseOrderNumber,
AccountNumber,
CustomerID,
ContactID,
SalesPersonID,
TerritoryID,
BillToAddressID,
ShipToAddressID,
ShipMethodID,
CreditCardID,
CreditCardApprovalCode,
CurrencyRateID,
SubTotal,
TaxAmt,
Freight,
TotalDue,
Comment,
rowguid,
ModifiedDate
from Sales.SalesOrderHeader h
go
set showplan_xml off
go

Note that the database portion of the fully qualified name is ignored when the query_hash value is generated. This allows resource usage to be aggregated across all queries in systems that replicate the same schema and queries against many databases on the same instance.
An easy way to detect applications that submit lots of ad hoc queries is by grouping on the sys.dm_exec_query_stats.query_hash column as follows.

select q.query_hash,
q.number_of_entries,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go

Queries that have a number_of_entries value in the hundreds or thousands are excellent candidates for parameterization. If you look at the CompileTime and CompileCPU attributes under the <QueryPlan> tag of the sample XML query plan and multiply those values times the number_of_entries value for that query, you can get an estimate of how much compile time and CPU you can eliminate by parameterizing the query (which means that the query is compiled once, and then it is cached and reused for subsequent executions). Eliminating these unnecessary cached plans has other intangible benefits as well, such as freeing memory to cache other compiled plans (thereby further reducing compilation overhead) and leaving more memory for the buffer cache.
Resolution
SQL Server 2008 also produces a query_plan_hash value that represents a “signature” of the query plan’s access path (that is, what join algorithm is used, the join order, index selection, and so forth). Some applications might rely on getting a different query plan based on the optimizer evaluating the specific parameter values passed to that execution of the query. If that is the case, you do not want to parameterize the queries.
You can use the query_hash and query_plan_hash values together to determine whether a set of ad hoc queries with the same query_hash value resulted in query plans with the same or different query_plan_hash values, or access path. This is done via a small modification to the earlier query.

select q.query_hash,
q.number_of_entries,
q.distinct_plans,
t.text as sample_query,
p.query_plan as sample_plan
from (select top 20 query_hash,
count(*) as number_of_entries,
count(distinct query_plan_hash) as distinct_plans,
min(sql_handle) as sample_sql_handle,
min(plan_handle) as sample_plan_handle
from sys.dm_exec_query_stats
group by query_hash
having count(*) > 1
order by count(*) desc) as q
cross apply sys.dm_exec_sql_text(q.sample_sql_handle) as t
cross apply sys.dm_exec_query_plan(q.sample_plan_handle) as p
go

Note that this new query returns a count of the number of distinct query plans (query_plan_hash values) for a given query_hash value. Rows that return a large number for number_of_entries and a distinct_plans count of 1 are good candidates for parameterization. Even if the number of distinct plans is more than one, you can use sys.dm_exec_query_plan to retrieve the different query plans and examine them to see whether the difference is important and necessary for achieving optimal performance.
After you determine which queries should be parameterized, the best place to parameterize them is the client application. The details of how you do this vary slightly depending on which client API you use, but the one consistent thing across all of the APIs is that instead of building the query string with literal predicates, you build a string with a question mark (?) as a parameter marker.

-- Submitting as ad hoc query
select * from Sales.SalesOrderHeader where SalesOrderID = 100

-- Submitting as parameterized
select * from Sales.SalesOrderHeader where SalesOrderID = ?

You should use the appropriate APIs for your technology (ODBC, OLE DB, or SQLClient) to bind a value to the parameter marker. The client driver or provider then submits the query in its parameterized form using sp_executesql.

exec sp_executesql N’select * from Sales.SalesOrderHeader where SalesOrderID = @P1’, N’@P1 int’, 100

Because the query is parameterized, it matches and reuses an existing cached plan.
If the entire workload for a given database is appropriate for parameterization and you do not have control over (or can’t change) the client application, you can also enable the forced parameterization option for the database. Note the caveats mentioned earlier; this can prevent the optimizer from matching indexed views and indexes on computed columns.

ALTER DATABASE AdventureWorks SET PARAMETERIZATION FORCED

If you can’t parameterize the client application or enable forced parameterization for the entire database, you can still create a template plan guide for specific queries with the OPTION (PARAMETERIZATION FORCED) hint. For more information about the steps required to do this, see Forced Parameterization (http://technet.microsoft.com/en-us/library/ms175037.aspx) in SQL Server 2008 Books Online.

Unnecessary Recompilation
When a batch or remote procedure call (RPC) is submitted to SQL Server, the server checks for the validity and correctness of the query plan before it begins executing. If one of these checks fails, the batch may have to be compiled again to produce a different query plan. Such compilations are known as recompilations. These recompilations are generally necessary to ensure correctness and are often performed when the server determines that there could be a more optimal query plan due to changes in underlying data. Compilations by nature are CPU intensive and hence excessive recompilations could result in a CPU-bound performance problem on the system.
In SQL Server 2000, when SQL Server recompiles a stored procedure, the entire stored procedure is recompiled, not just the statement that triggered the recompilation. In SQL Server 2008 and SQL Server 2005, the behavior is changed to statement-level recompilation of stored procedures. When SQL Server 2008 or SQL Server 2005 recompiles stored procedures, only the statement that caused the recompilation is compiled—not the entire procedure. This uses less CPU bandwidth and results in less contention on lock resources such as COMPILE locks. Recompilation can happen in response to various conditions, such as:
• Schema changes
• Statistics changes
• Deferred compilation
• SET option changes
• Temporary table changes
• Stored procedure creation with the RECOMPILE query hint or the OPTION (RECOMPILE) query hint
Detection
You can use Performance Monitor and SQL Server Profiler to detect excessive compilation and recompilation.
Performance Monitor
The SQL Statistics object provides counters to monitor compilation and the type of requests that are sent to an instance of SQL Server. You must monitor the number of query compilations and recompilations in conjunction with the number of batches received to find out whether the compilations are contributing to high CPU use. Ideally, the ratio of SQL Recompilations/sec to Batch Requests/sec should be very low, unless users are submitting ad hoc queries.
These are the key data counters:
• SQL Server: SQL Statistics: Batch Requests/sec
• SQL Server: SQL Statistics: SQL Compilations/sec
• SQL Server: SQL Statistics: SQL Recompilations/sec
For more information, see SQL Statistics Object (http://msdn.microsoft.com/en-us/library/ms190911.aspx) in SQL Server 2008 Books Online.
SQL Server Profiler Trace
If the Performance Monitor counters indicate a high number of recompilations, the recompilations could be contributing to the high CPU consumed by SQL Server. Look at the profiler trace to find the stored procedures that are being recompiled. The SQL Server Profiler trace provides that information along with the reason for the recompilation. You can use the following events to get this information.
SP:Recompile / SQL:StmtRecompile
The SP:Recompile and the SQL:StmtRecompile event classes indicate which stored procedures and statements have been recompiled. When you compile a stored procedure, one event is generated for the stored procedure and one for each statement that is compiled. However, when a stored procedure recompiles, only the statement that caused the recompilation is recompiled. Some of the more important data columns for the SP:Recompile event class are listed here. The EventSubClass data column in particular is important for determining the reason for the recompilation. SP:Recompile is triggered once for the procedure or trigger that is recompiled and is not fired for an ad hoc batch that could likely be recompiled. In SQL Server 2008 and SQL Server 2005, it is more useful to monitor SQL:StmtRecompile, because this event class is fired when any type of batch, ad hoc, stored procedure, or trigger is recompiled.
The key data columns to look at in these events are as follows.
• EventClass
• EventSubClass
• ObjectID (represents stored procedure that contains this statement)
• SPID
• StartTime
• SqlHandle
• TextData
For more information, see SQL:StmtRecompile Event Class (http://technet.microsoft.com/en-us/library/ms179294.aspx) in SQL Server 2008 Books Online.
If you have a trace file saved, you can use the following query to see all the recompilation events that were captured in the trace.

select
spid,
StartTime,
Textdata,
EventSubclass,
ObjectID,
DatabaseID,
SQLHandle
from
fn_trace_gettable ( 'e:\recompiletrace.trc' , 1)
where
EventClass in(37,75,166)
EventClass 37 = Sp:Recompile, 75 = CursorRecompile, 166 = SQL:StmtRecompile

For more information about trace events, see sp_trace_setevent (http://msdn.microsoft.com/en-us/library/ms186265.aspx) in SQL Server 2008 Books Online.
You could further group the results from this query by the SqlHandle and ObjectID columns, or by various other columns, to see whether most of the recompilations are attributed by one stored procedure or are due to some other reason (such as a SET option that has changed).
Showplan XML For Query Compile
The Showplan XML For Query Compile event class occurs when SQL Server compiles or recompiles a Transact-SQL statement. This event has information about the statement that is being compiled or recompiled. This information includes the query plan and the object ID of the procedure in question. Capturing this event has significant performance overhead, because it is captured for each compilation or recompilation. If you see a high value for the SQL Compilations/sec counter in Performance Monitor, you should monitor this event. With this information, you can see which statements are frequently recompiled. You can use this information to change the parameters of those statements. This should reduce the number of recompilations.
DMVs
When you use the sys.dm_exec_query_optimizer_info DMV, you can get a good idea of the time SQL Server spends optimizing. If you take two snapshots of this DMV, you can get a good feel for the time that is spent optimizing in the given time period.

select * from sys.dm_exec_query_optimizer_info
counter occurrence value
---------------- -------------------- ---------------------
optimizations 81 1.0
elapsed time 81 6.4547820702944486E-2

In particular, look at the elapsed time, which is the time elapsed due to optimizations. Because the elapsed time during optimization is generally close to the CPU time that is used for the optimization (because the optimization process is very CPU bound), you can get a good measure of the extent to which the compilation and recompilation time is contributing to the high CPU use.
Another DMV that is useful for capturing this information is sys.dm_exec_query_stats.
The data columns to look at are as follows:
• Sql_handle
• Total worker time
• Plan generation number
• Statement Start Offset
For more information, see sys.dm_exec_query_stats (http://msdn.microsoft.com/en-us/library/ms189741.aspx) in SQL Server 2008 Books Online.
In particular, plan_generation_num indicates the number of times the query has recompiled. The following sample query gives you the top 25 stored procedures that have been recompiled.

select * from sys.dm_exec_query_optimizer_info

select top 25
sql_text.text,
sql_handle,
plan_generation_num,
execution_count,
dbid,
objectid
from
sys.dm_exec_query_stats a
cross apply sys.dm_exec_sql_text(sql_handle) as sql_text
where
plan_generation_num >1
order by plan_generation_num desc

For more information, see Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx) on Microsoft TechNet.

Resolution
If you detect excessive compilation and recompilation, consider the following options:
• If the recompilation occurred because a SET option changed, use SQL Server Profiler to determine which SET option changed. Avoid changing SET options within stored procedures. It is better to set them at the connection level. Ensure that SET options are not changed during the lifetime of the connection.
• Recompilation thresholds for temporary tables are lower than for normal tables. If the recompilations on a temporary table are due to statistics changes, you can change the temporary tables to table variables. A change in the cardinality of a table variable does not cause a recompilation. The drawback of this approach is that the query optimizer does not keep track of a table variable’s cardinality because statistics are not created or maintained on table variables. This can result in less optimal query plans. You can test the different options and choose the best one.
• Another option is to use the KEEP PLAN query hint. This sets the threshold of temporary tables to be the same as that of permanent tables. The EventSubclass column displays “Statistics Changed” for an operation on a temporary table.
• To avoid recompilations that are due to changes in statistics (for example, if the plan becomes suboptimal due to change in the data statistics), specify the KEEPFIXED PLAN query hint. With this option in effect, recompilations can only happen to ensure correctness (for example, when the underlying table structure has changed and the plan no longer applies) and not to respond to changes in statistics. For example, a recompilation can occur if the schema of a table that is referenced by a statement changes, or if a table is marked with the sp_recompile stored procedure.
• Turning off the automatic updates of statistics for indexes and statistics that are defined on a table or indexed view prevents recompilations that are due to statistics changes on that object. Note, however, that turning off the auto-stats feature by using this method is not usually a good idea. This is because the query optimizer is no longer sensitive to data changes in those objects and suboptimal query plans might result. Use this method only as a last resort after exhausting all other alternatives.
• Batches should have qualified object names (for example, dbo.Table1) to avoid recompilation and to avoid ambiguity between objects.
• To avoid recompilations that are due to deferred compilations, do not interleave DML and DDL or create the DDL from conditional constructs such as IF statements.
• Run Database Engine Tuning Advisor (DTA) to see whether any indexing changes improve the compile time and the execution time of the query.
• Check to see whether the stored procedure was created with the WITH RECOMPILE option or whether the RECOMPILE query hint was used. If a procedure was created with the WITH RECOMPILE option, in SQL Server 2008 or SQL Server 2005, you may be able to take advantage of a statement-level RECOMPILE hint if a particular statement within that procedure needs to be recompiled. Using this hint at the statement level avoids the necessity of recompiling the whole procedure each time it executes, while at the same time allowing the individual statement to be compiled. For more information about the RECOMPILE hint, see Query Hints (Transact-SQL) (http://msdn.microsoft.com/en-us/library/ms181714.aspx) in SQL Server 2008 Books Online.
Inefficient Query Plan
When generating an execution plan for a query, the SQL Server optimizer attempts to choose a plan that provides the fastest response time for that query. Note that the fastest response time doesn’t necessarily mean minimizing the amount of I/O that is used, nor does it necessarily mean using the least amount of CPU—it is a balance of the various resources.
Certain types of operators are more CPU-intensive than others. By their nature, the Hash operator and Sort operator scan through their respective input data. If read-ahead (prefetch) is used during such a scan, the pages are almost always available in the buffer cache before the page is needed by the operator. Thus, waits for physical I/O are minimized or eliminated. If these types of operations are no longer constrained by physical I/O, they tend to manifest themselves in high CPU consumption. By contrast, nested loop joins have many index lookups and can quickly become I/O bound if the index lookups are traversing to many different parts of the table so that the pages can’t fit into the buffer cache.
The most significant input the optimizer uses in evaluating the cost of various alternative query plans is the cardinality estimates for each operator, which you can see in the Showplan (EstimateRows and EstimateExecutions attributes). Without accurate cardinality estimates, the primary input used in optimization is flawed, and many times so is the final plan.
For an excellent white paper that describes in detail how the SQL Server optimizer uses statistics, see Statistics Used by the Query Optimizer in Microsoft SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx). The white paper discusses how the optimizer uses statistics, best practices for maintaining up-to-date statistics, and some common query design issues that can prevent accurate estimate cardinality and thus cause inefficient query plans.

Detection
Inefficient query plans are usually detected comparatively. An inefficient query plan can cause increased CPU consumption.
The following query against sys.dm_exec_query_stats is an efficient way to determine which query is using the most cumulative CPU.

select
highest_cpu_queries.plan_handle,
highest_cpu_queries.total_worker_time,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.[text]
from
(select top 50
qs.plan_handle,
qs.total_worker_time
from
sys.dm_exec_query_stats qs
order by qs.total_worker_time desc) as highest_cpu_queries
cross apply sys.dm_exec_sql_text(plan_handle) as q
order by highest_cpu_queries.total_worker_time desc

Alternatively, you can query against sys.dm_exec_cached_plans by using filters for various operators that may be CPU intensive, such as ‘%Hash Match%’, ‘%Sort%’ to look for suspects.
Resolution
Consider the following options if you detect inefficient query plans:
• Tune the query with the Database Engine Tuning Advisor to see whether it produces any index recommendations.
• Check for issues with bad cardinality estimates.
• Are the queries written so that they use the most restrictive WHERE clause that is applicable? Unrestricted queries are resource intensive by their very nature.
• Run UPDATE STATISTICS on the tables involved in the query and check to see whether the problem persists.
• Does the query use constructs for which the optimizer is unable to accurately estimate cardinality? Consider whether the query can be modified in a way so that the issue can be avoided.
• If it is not possible to modify the schema or the query, you can use the plan guide feature to specify query hints for queries that match certain text criteria. Plan guides can be created for ad hoc queries as well as queries inside a stored procedure. Hints such as OPTION (OPTIMIZE FOR) enable you to impact the cardinality estimates while leaving the optimizer its full array of potential plans. Other hints such as OPTION (FORCE ORDER) or OPTION (USE PLAN) provide you with varying degrees of control over the query plan. SQL Server 2008 offers full DML support for plan guides, which means that that they can be created for SELECT, INSERT, UPDATE, DELETE or MERGE statements.
• SQL Server 2008 also offers a new feature called plan freezing that allows you to freeze a plan exactly as it exists in the plan cache. This option is similar to creating a plan guide with the USE PLAN query hint specified. However, it eliminates the need to execute lengthy commands as required when creating a plan guides. It also minimizes the user errors with go along with those lengthy commands. For example, the simple two-statement batch presented below is all that’s needed to freeze a plan for a query that matches the specified text criteria.

DECLARE @plan_handle varbinary(64);

-- Extract the query's plan_handle.
SELECT @plan_handle = plan_handle FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
WHERE text LIKE N'Some query matching criteria%';

EXECUTE sp_create_plan_guide_from_handle
@name = N'Sample_PG1',
@plan_handle = @plan_handle,
@statement_start_offset = NULL;
GO

This statement creates a plan guide (Sample_PG1) in the sys.plan_guides table.
Intraquery Parallelism
When generating an execution plan for a query, the SQL Server optimizer attempts to choose the plan that provides the fastest response time for that query. If the query’s cost exceeds the value specified in the cost threshold for parallelism option and parallelism has not been disabled, the optimizer attempts to generate a plan that can be run in parallel. A parallel query plan uses multiple threads to process the query, with each thread distributed across the available CPUs and concurrently utilizing CPU time from each processor. The maximum degree of parallelism can be limited server-wide by using the max degree of parallelism option, on a resource workload group level, or on a per-query level by using the OPTION (MAXDOP) hint.
The decision on the actual degree of parallelism (DOP) used for execution—a measure of how many threads will do a given operation in parallel—is deferred until execution time. Before executing the query, SQL Server determines how many schedulers are underutilized and chooses a DOP for the query that fully utilizes the remaining schedulers. After a DOP is chosen, the query runs with the chosen degree of parallelism until completion. A parallel query typically uses a similar but slightly higher amount of CPU time as compared to the corresponding serial execution plan, but it does so in a shorter amount of time. As long as there are no other bottlenecks, such as waits for physical I/O, parallel plans generally should use 100% of the CPU across all of the processors.
One key factor (how idle the system is) that led to running a parallel plan can change after the query starts executing. This can change, however, after the query starts executing. For example, if a query comes in during an idle time, the server might choose to run with a parallel plan and use a DOP of four and spawn up threads on four different processors. After those threads start executing, existing connections can submit other queries that also require a lot of CPU. At that point, all the different threads will share short time slices of the available CPU, resulting in higher query duration.
Running with a parallel plan is not inherently bad and should provide the fastest response time for that query. However, the response time for a given query must be weighed against the overall throughput and responsiveness of the rest of the queries on the system. Parallel queries are generally best suited to batch processing and decision support workloads and might not be useful in a transaction processing environment.
SQL Server 2008 implemented significant scalability improvements to fully utilize available hardware with partitioned table queries. Consequently, SQL Server 2008 might use higher amounts of CPU during parallel query execution than older versions. If this is not desired, you should limit or disable parallelism.

Detection
Intraquery parallelism problems can be detected by using the following methods.
Performance Monitor
For more information, see the SQL Server:SQL Statistics – Batch Requests/sec counter and SQL Statistics Object (http://msdn.microsoft.com/en-us/library/ms190911.aspx) in SQL Server 2008 Books Online.
Because a query must have an estimated cost that exceeds the cost threshold for the parallelism configuration setting (which defaults to 5) before it is considered for a parallel plan, the more batches a server is processing per second, the less likely it is that the batches are running with parallel plans. Servers that are running many parallel queries normally have small batch requests per second (for example, values less than 100).
DMVs
From a running server, you can determine whether any active requests are running in parallel for a given session by using the following query.

select
r.session_id,
r.request_id,
max(isnull(exec_context_id, 0)) as number_of_workers,
r.sql_handle,
r.statement_start_offset,
r.statement_end_offset,
r.plan_handle
from
sys.dm_exec_requests r
join sys.dm_os_tasks t on r.session_id = t.session_id
join sys.dm_exec_sessions s on r.session_id = s.session_id
where
s.is_user_process = 0x1
group by
r.session_id, r.request_id,
r.sql_handle, r.plan_handle,
r.statement_start_offset, r.statement_end_offset
having max(isnull(exec_context_id, 0)) > 0

With this information, you can easily retrieve the text of the query by using sys.dm_exec_sql_text, and you can retrieve the plan by using sys.dm_exec_cached_plan.
You can also search for plans that are eligible to run in parallel. To do this, search the cached plans to see whether a relational operator has its Parallel attribute as a nonzero value. These plans might not run in parallel, but they can to do so if the system is not too busy.

--
-- Find query plans that can run in parallel
--
select
p.*,
q.*,
cp.plan_handle
from
sys.dm_exec_cached_plans cp
cross apply sys.dm_exec_query_plan(cp.plan_handle) p
cross apply sys.dm_exec_sql_text(cp.plan_handle) as q
where
cp.cacheobjtype = 'Compiled Plan' and
p.query_plan.value('declare namespace p="http://schemas.microsoft.com/sqlserver/2004/07/showplan";
max(//p:RelOp/@Parallel)', 'float') > 0

In general, the duration of a query is longer than the amount of CPU time, because some of the time was spent waiting on resources such as a lock or physical I/O. The only scenario where a query can use more CPU time than the elapsed duration is when the query runs with a parallel plan such that multiple threads concurrently use CPU. Note that not all parallel queries demonstrate this behavior (where the CPU time is greater than the duration).

select
qs.sql_handle,
qs.statement_start_offset,
qs.statement_end_offset,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.text
from
sys.dm_exec_query_stats qs
cross apply sys.dm_exec_sql_text(qs.plan_handle) as q
where
qs.total_worker_time > qs.total_elapsed_time
SQL Trace
Look for the following signs of parallel queries, which could be either
statements or batches that have CPU time greater than the duration.

select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
EventClass in (10, 12) -- RPC:Completed, SQL:BatchCompleted
and CPU > Duration/1000 -- CPU is in milliseconds, Duration in
microseconds oOr can be Showplans (un-encoded) that have Parallelism
operators in them
select
EventClass,
TextData
from
::fn_trace_gettable('c:\temp\high_cpu_trace.trc', default)
where
TextData LIKE '%Parallelism%'

Resolution
• Any query that runs with a parallel plan is one that the optimizer identifies as expensive enough to exceed the cost threshold of parallelism, which defaults to 5 (roughly a 5-second execution time on a reference computer). Any queries identified through the previous methods are candidates for further tuning.
• Use the Database Engine Tuning Advisor to see whether any indexing changes, changes to indexed views, or partitioning changes could reduce the cost of the query.
• Check for significant differences in the actual versus the estimated cardinality, because the cardinality estimates are the primary factor in estimating the cost of the query. If any significant differences are found:
o If the auto create statistics database option is disabled, make sure that there are no MISSING STATS entries in the Warnings column of the Showplan output.
o Try running UPDATE STATISTICS on the tables where the cardinality estimates are off.
o Verify that the query doesn’t use a query construct that the optimizer can’t accurately estimate, such as multistatement table-valued functions or CLR functions, table variables, or comparisons with a Transact-SQL variable (comparisons with a parameter are okay).
o Evaluate whether the query could be written in a more efficient fashion using different Transact-SQL statements or expressions.
Poor Cursor Usage
Versions of SQL Server prior to SQL Server 2005 only supported a single active common per connection. A query that was executing or had results pending to send to the client was considered active. In some situations, the client application might need to read through the results and submit other queries to SQL Server based on the row just read from the result set. This could not be done with a default result set, because it could have other pending results. A common solution was to change the connection properties to use a server-side cursor.
When a server-side cursor is used, the database client software (the OLE DB provider or ODBC driver) transparently encapsulates client requests inside special extended stored procedures, such as sp_cursoropen or sp_cursorfetch. This is referred to as an API cursor (as opposed to a Transact-SQL cursor). When the user executes the query, the query text is sent to the server via sp_cursoropen; requests to read from the result set result in a sp_cursorfetch instructing the server to send back only a certain number of rows. By controlling the number of rows that are fetched, the ODBC driver or OLE DB provider can cache the row or rows. This prevents a situation where the server is waiting for the client to read all the rows it has sent. Thus, the server is ready to accept a new request on that connection.

Applications that open cursors and fetch one row (or a small number of rows) at a time can easily become bottlenecked by the network latency, especially on a wide area network (WAN). On a fast network with many different user connections, the overhead required to process many cursor requests can become significant. Because of the overhead associated with repositioning the cursor to the appropriate location in the result set, per-request processing overhead, and similar processing, it is more efficient for the server to process a single request that returns 100 rows than to process 100 separate requests that return the same 100 rows one row at a time.
Detection
You can use the following tools to troubleshoot poor cursor usage.
Performance Monitor
By looking at the SQL Server:Cursor Manager By Type – Cursor Requests/Sec counter, you can get a general feel for how many cursors are being used on the system. Systems that have high CPU utilization because of small fetch sizes typically have hundreds of cursor requests per second. There are no specific counters that list the fetch buffer size.
DMVs
You can use following query to determine the connections with API cursors (as opposed to Transact-SQL cursors) that are using a fetch buffer size of one row. It is much more efficient to use a larger fetch buffer, such as 100 rows.

select
cur.*
from
sys.dm_exec_connections con
cross apply sys.dm_exec_cursors(con.session_id) as cur
where
cur.fetch_buffer_size = 1
and cur.properties LIKE 'API%' -- API cursor (Transact-SQL cursors
always
have a fetch buffer of 1)

SQL Trace
Use a trace that includes the RPC:Completed event class search for sp_cursorfetch statements. The value of the fourth parameter is the number of rows returned by the fetch. The maximum number of rows that are requested to be returned is specified as an input parameter in the corresponding RPC:Starting event class.

Resolution
• Determine whether cursors are the most appropriate means to accomplish the processing or whether a set-based operation, which is generally more efficient, is possible.
• Consider enabling multiple active results (MARS) when connecting to SQL Server 2008.
• Consult the appropriate documentation for your specific API to determine how to specify a larger fetch buffer size for the cursor:
o ODBC - SQL_ATTR_ROW_ARRAY_SIZE
o OLE DB – IRowset::GetNextRows or IRowsetLocate::GetRowsAt

SQL Server 2008性能故障排查（二）——CPU

承接上一篇：SQL Server 2008性能故障排查（一）——概论

o 是否禁用了自动开启统计，确保在执行计划的警告部分不会出现丢失状态等信息

o 如果预估计划被禁用，那么在对应表上运行：UPDATE STATISTICS

o 确保查询没有使用到优化器无法明确预估的构造器，如多语句的表值函数或者CLR函数、表变量或者带有比较功能的T-SQL函数（带有参数的比较是允许的）。

o 评估语句是否能用更有效更新的方式去实现。

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

SQL Server 2008性能故障排查（二）——CPU

承接上一篇：SQL Server 2008性能故障排查（一）——概论

o 是否禁用了自动开启统计，确保在执行计划的警告部分不会出现丢失状态等信息

o 如果预估计划被禁用，那么在对应表上运行：UPDATE STATISTICS

o 确保查询没有使用到优化器无法明确预估的构造器，如多语句的表值函数或者CLR函数、表变量或者带有比较功能的T-SQL函数（带有参数的比较是允许的）。

o 评估语句是否能用更有效更新的方式去实现。

热门文章

最新文章

相关课程

相关电子书

相关实验场景