SQL Server Performance Tuning Question and Answers
Q. How to read the graphical execution plan?
A: The plan should be read from right to left
- Check the Graphical execution plan of a stored procedure / Query.
- Table Scan – Index is missing.
- Index Scan – Proper indexes are not using.
- BookMark Lookup – Limit the number of columns in the select list.
- Filter – Remove any functions from where clause, May require additional indexes.
- Sort – Does the data really need to be sorted? Can an index be used to avoid sorting? Can sorting be done at the client more efficiently?.
- DataFlow Arrow – High density: Sometimes you find few rows as outcome but the arrow line density indicates the query/proc processing huge number of rows.
- Cost – Can easily find out which table / operation taking much time
- From the execution plan we can find out the bottleneck and give the possible solution to avoid the latency.
For more details click here
Q. What will be the reason if Actual and Estimated Execution Plans are having differ?
A: - When Statistics are Stale:
The main cause of a difference between the plans is differences between the statistics and the actual data. This generally occurs over time as data is added and deleted.
- When the Estimated plan is invalid:
When the batch contains temporary tables or the T-SQL statements which refers some of the objects that are not currently existed in the database, but will be created once the batch is run. (Create table is there in batch)
Q. What are the types of SQL Server Waits?
A: In general there are three categories of waits that could affect any given request:
- Resource waits: are caused by a particular resource, perhaps a specific lock that is unavailable when the requested is submitted.
- External waits: occur when SQL Server worker thread is waiting on an external process.
- Queue waits: normally apply to internal background tasks, such as ghost cleanup, which physically removes records that have been previously deleted.
Using below DMV, we can get the wait details for each SPid's:
SELECT * From sys.dm_os_wait_stats
Q. Explain three different approaches to capture a query plan?
A: Here are the way to capture/read query plan:
i) SHOWPLAN_TEXT
ii) SHOWPLAN_ALL
iii) Graphical Query Plan
Q. Explain different types of Locks in SQL Server?
A: There are 3 types of locks in SQL Server
i) Shared locks - they are used for operations which do not allow any change or update of data. For e.g. SELECT.
ii) Update locks - they are used when SQL Server wants to modify a page. The update page lock is then promoted to an exclusive page lock before actually making the changes.
iii) Exclusive locks - they are used for the data modification operations. For e.g. UPDATE, INSERT, or DELETE.
Q. What are the basic Perfmon Counters would you consider while troubleshooting the performance issue?
A: Here are few basic counters which can help to troubleshoot the issue:
CPU bottlenecks:
- Signal waits > 25% of total waits.
(See sys.dm_os_wait_stats for Signal waits and Total waits. Signal waits measure the time spent in the runnable queue waiting for CPU. High signal waits indicate a CPU bottleneck.)
- Plan re-use < 90%.
(A query plan is used to execute a query. Plan re-use is desirable for OLTP workloads because re-creating the same plan (for similar or identical transactions) is a waste of CPU resources. Compare SQL Server SQL Statistics: batch requests/sec to SQL compilations/sec. Compute plan re-use as follows: Plan re-use = (Batch requests – SQL compilations) / Batch requests. Special exception to the plan re-use rule: Zero cost plans will not be cached (not re-used) in SQL 2005 SP2. Applications that use zero cost plans will have a lower plan re-use but this is not a performance issue.)
Memory bottleneck:
- Page Life Expectancy – Number of seconds a page will stay in the buffer pool without references
Consistently low average page life expectancy. (MSSQL$Instance: Buffer Manager\Page Life Expectancy:)
(See Average Page Life Expectancy Counter which is in the Perfmon object SQL Server Buffer Manager (this represents is the average number of seconds a page stays in cache). For OLTP, an average page life expectancy of 300 is 5 minutes. Anything less could indicate memory pressure, missing indexes, or a cache flush)
- Buffer Cache Hit Ratio – Indicates the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses.
Consistently low SQL Cache hit ratio. (MSSQL$Instance: Plan Cache\Cache Hit Ratio:)
(OLTP applications (e.g. small transactions) should have a high cache hit ratio. Since OLTP transactions are small, there should not be (1) big drops in SQL Cache hit rates or (2) consistently low cache hit rates < 90%. Drops or low cache hit may indicate memory pressure or missing indexes.)
IO bottleneck:
- High average disk seconds per read.
(When the IO subsystem is queued, disk seconds per read increases. See Perfmon Logical or Physical disk (disk seconds/read counter). Normally it takes 4-8ms to complete a read when there is no IO pressure. When the IO subsystem is under pressure due to high IO requests, the average time to complete a read increases, showing the effect of disk queues. Periodic higher values for disk seconds/read may be acceptable for many applications. For high performance OLTP applications, sophisticated SAN subsystems provide greater IO scalability and resiliency in handling spikes of IO activity. Sustained high values for disk seconds/read (>15ms) does indicate a disk bottleneck.)
- High average disk seconds per write.
(See Perfmon Logical or Physical disk. The throughput for high volume OLTP applications is dependent on fast sequential transaction log writes. A transaction log write can be as fast as 1ms (or less) for high performance SAN environments. For many applications, a periodic spike in average disk seconds per write is acceptable considering the high cost of sophisticated SAN subsystems. However, sustained high values for average disk seconds/write is a reliable indicator of a disk bottleneck.)
- Big IOs such as table and range scans due to missing indexes.
Blocking bottleneck:
- High average row lock or latch waits.
(The average row lock or latch waits are computed by dividing lock and latch wait milliseconds (ms) by lock and latch waits. The average lock wait ms computed from sys.dm_db_index_operational_stats represents the average time for each block.)
- Top wait statistics
- High number of deadlocks.
(See Profiler “Graphical Deadlock” under Locks event to identify the statements involved in the deadlock.)
Network bottleneck:
- High network latency coupled with an application that incurs many round trips to the database.
- Network bandwidth is used up.
(See counters packets/sec and current bandwidth counters in the network interface object of Performance Monitor. For TCP/IP frames actual bandwidth is computed as packets/sec * 1500 * 8 /1000000 Mbps)
Must refer Technet Article about Perfmon Counter
Q. How to read the graphical execution plan?
A: The plan should be read from right to left
- Check the Graphical execution plan of a stored procedure / Query.
- Table Scan – Index is missing.
- Index Scan – Proper indexes are not using.
- BookMark Lookup – Limit the number of columns in the select list.
- Filter – Remove any functions from where clause, May require additional indexes.
- Sort – Does the data really need to be sorted? Can an index be used to avoid sorting? Can sorting be done at the client more efficiently?.
- DataFlow Arrow – High density: Sometimes you find few rows as outcome but the arrow line density indicates the query/proc processing huge number of rows.
- Cost – Can easily find out which table / operation taking much time
- From the execution plan we can find out the bottleneck and give the possible solution to avoid the latency.
For more details click here
Q. What will be the reason if Actual and Estimated Execution Plans are having differ?
A: - When Statistics are Stale:
The main cause of a difference between the plans is differences between the statistics and the actual data. This generally occurs over time as data is added and deleted.
- When the Estimated plan is invalid:
When the batch contains temporary tables or the T-SQL statements which refers some of the objects that are not currently existed in the database, but will be created once the batch is run. (Create table is there in batch)
Q. What are the types of SQL Server Waits?
A: In general there are three categories of waits that could affect any given request:
- Resource waits: are caused by a particular resource, perhaps a specific lock that is unavailable when the requested is submitted.
- External waits: occur when SQL Server worker thread is waiting on an external process.
- Queue waits: normally apply to internal background tasks, such as ghost cleanup, which physically removes records that have been previously deleted.
Using below DMV, we can get the wait details for each SPid's:
SELECT * From sys.dm_os_wait_stats
Q. Explain three different approaches to capture a query plan?
A: Here are the way to capture/read query plan:
i) SHOWPLAN_TEXT
ii) SHOWPLAN_ALL
iii) Graphical Query Plan
Q. Explain different types of Locks in SQL Server?
A: There are 3 types of locks in SQL Server
i) Shared locks - they are used for operations which do not allow any change or update of data. For e.g. SELECT.
ii) Update locks - they are used when SQL Server wants to modify a page. The update page lock is then promoted to an exclusive page lock before actually making the changes.
iii) Exclusive locks - they are used for the data modification operations. For e.g. UPDATE, INSERT, or DELETE.
Q. What are the basic Perfmon Counters would you consider while troubleshooting the performance issue?
A: Here are few basic counters which can help to troubleshoot the issue:
CPU bottlenecks:
- Signal waits > 25% of total waits.
(See sys.dm_os_wait_stats for Signal waits and Total waits. Signal waits measure the time spent in the runnable queue waiting for CPU. High signal waits indicate a CPU bottleneck.)
- Plan re-use < 90%.
(A query plan is used to execute a query. Plan re-use is desirable for OLTP workloads because re-creating the same plan (for similar or identical transactions) is a waste of CPU resources. Compare SQL Server SQL Statistics: batch requests/sec to SQL compilations/sec. Compute plan re-use as follows: Plan re-use = (Batch requests – SQL compilations) / Batch requests. Special exception to the plan re-use rule: Zero cost plans will not be cached (not re-used) in SQL 2005 SP2. Applications that use zero cost plans will have a lower plan re-use but this is not a performance issue.)
Memory bottleneck:
- Page Life Expectancy – Number of seconds a page will stay in the buffer pool without references
Consistently low average page life expectancy. (MSSQL$Instance: Buffer Manager\Page Life Expectancy:)
(See Average Page Life Expectancy Counter which is in the Perfmon object SQL Server Buffer Manager (this represents is the average number of seconds a page stays in cache). For OLTP, an average page life expectancy of 300 is 5 minutes. Anything less could indicate memory pressure, missing indexes, or a cache flush)
- Buffer Cache Hit Ratio – Indicates the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses.
Consistently low SQL Cache hit ratio. (MSSQL$Instance: Plan Cache\Cache Hit Ratio:)
(OLTP applications (e.g. small transactions) should have a high cache hit ratio. Since OLTP transactions are small, there should not be (1) big drops in SQL Cache hit rates or (2) consistently low cache hit rates < 90%. Drops or low cache hit may indicate memory pressure or missing indexes.)
IO bottleneck:
- High average disk seconds per read.
(When the IO subsystem is queued, disk seconds per read increases. See Perfmon Logical or Physical disk (disk seconds/read counter). Normally it takes 4-8ms to complete a read when there is no IO pressure. When the IO subsystem is under pressure due to high IO requests, the average time to complete a read increases, showing the effect of disk queues. Periodic higher values for disk seconds/read may be acceptable for many applications. For high performance OLTP applications, sophisticated SAN subsystems provide greater IO scalability and resiliency in handling spikes of IO activity. Sustained high values for disk seconds/read (>15ms) does indicate a disk bottleneck.)
- High average disk seconds per write.
(See Perfmon Logical or Physical disk. The throughput for high volume OLTP applications is dependent on fast sequential transaction log writes. A transaction log write can be as fast as 1ms (or less) for high performance SAN environments. For many applications, a periodic spike in average disk seconds per write is acceptable considering the high cost of sophisticated SAN subsystems. However, sustained high values for average disk seconds/write is a reliable indicator of a disk bottleneck.)
- Big IOs such as table and range scans due to missing indexes.
Blocking bottleneck:
- High average row lock or latch waits.
(The average row lock or latch waits are computed by dividing lock and latch wait milliseconds (ms) by lock and latch waits. The average lock wait ms computed from sys.dm_db_index_operational_stats represents the average time for each block.)
- Top wait statistics
- High number of deadlocks.
(See Profiler “Graphical Deadlock” under Locks event to identify the statements involved in the deadlock.)
Network bottleneck:
- High network latency coupled with an application that incurs many round trips to the database.
- Network bandwidth is used up.
(See counters packets/sec and current bandwidth counters in the network interface object of Performance Monitor. For TCP/IP frames actual bandwidth is computed as packets/sec * 1500 * 8 /1000000 Mbps)
Must refer Technet Article about Perfmon Counter
No comments:
Post a Comment