SQL Server Indexes Interview Questions and Answers
Q. What are the different types of Indexes in SQL Server?
A: There main two types of indexes in SQL Server:
1. Clustered Index:A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values.
2. Non-Clustered Index:A non-clustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the non-clustered index contains the non-clustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table.
apart from these, here are some other types of indexes:
3. Unique Index:An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns.A unique index is automatically created when you define a primary key or unique constraint:
--Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behaviour and define a unique, non-clustered index on the primary key.
--Unique: When you define a unique constraint, SQL Server automatically creates a unique, non-clustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.
4. Full Text Indexes:A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data.
5. Spatial Indexes: A spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. The spatial index reduces the number of objects on which relatively costly spatial operations need to be applied.
6. Filtered Index: An optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes. Use where clause in create index statement.
7. Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and non-clustered indexes can be composite indexes.
8. XML Indexes: A shredded, and persisted, representation of the XML binary large objects (BLOBs) in the xml data type column.
9. ColumnStore Index: Refer Post
Q. What is the Covering Index?
A: A covering index, which is a form of a composite non clustered index, includes all of the columns referenced in SELECT, JOIN, and WHERE clauses of a query. Because of this, the index contains the data you are looking for and SQL Server doesn’t have to look up the actual data in the table, reducing logical and/or physical I/O, and boosting performance.
On the other hand, if the covering index gets too big (has too many columns), this could actually increase I/O and degrade performance.
Q. What is Online Indexing?
A: Online Indexing is a new feature available from SQL Server 2005. In SQL Server 2005, DBAs can create, rebuild, or drop indexes online. The index operations on the underlying table can be performed concurrently with update or query operations. This was not possible in previous versions of SQL Server. In the past, indexing operations (reorganizing or rebuilding) were usually performed as a part of other maintenance tasks running during off-peak hours. During these offline operations, the indexing operations hold exclusive locks on the underlying table and associated indexes. During online index operations, SQL Server 2005 eliminates the need of exclusive locks.
The Online indexing feature is very helpful for environments that run 24 hours a day, seven days a week. The Online Indexing feature is available only in the Enterprise Edition of SQL Server 2005 and above.
Q. How Online Indexing works?
A: The online index operation can be divided into three phases:
-Preparation
-Build
-Final
The Build phase is a longest phase of all. It is in this phase where the creation, dropping, or rebuilding of indexes take place. The duration of the Build phase depends on the size of the data and the speed of the hardware. Exclusive locks are not held in this phase, so concurrent DML operations can be performed during this phase. The Preparation and Final phases are for shorter durations. They are independent of the size factor of the data. During these two short phases, the table or the indexed data is not available for concurrent DML operations.
Q. How will you identify unused indexes in a table?
A: You can identify using DMV sys.dm_db_index_usage_stats” which retrieves the statistics of indexes , if an index id is not in that dmv then we can say that index is not been using from a long time.
Q. What are the primary differences between an index reorganization and an index rebuild?
A: Here are few main differences:
-Reorganization is an “online” operation by default; a rebuild is an “offline” operation by default.
-Reorganization only affects the leaf level of an index.
-Reorganization swaps data pages in-place by using only the pages already allocated to the index; a rebuild uses new pages/allocations.
-Reorganization is always a fully-logged operation; a rebuild can be a minimally-logged operation.
-Reorganization can be stopped mid-process and all completed work is retained; a rebuild is transactional and must be completed in entirety to keep changes.
Q. What is fill factor? How to choose the fill factor while creating an index?
A: The Fill Factor specifies the % of fullness of the leaf level pages of an index. When an index is created or rebuilt the leaf level pages are written to the level where the pages are filled up to the fill factor value and the remainder of the page is left blank for future usage. This is the case when a value other than 0 or 100 is specified. For example, if a fill factor value of 70 is chosen, the index pages are all written with the pages being 70 % full, leaving 30 % of space for future usage.
Q. When to choose High or Low Fillfactor Value?
A: You might choose a high fill factor value when there is very little or no change in the underlying table’s data, such as a decision support system where data modification is not frequent, but on a regular and scheduled basis. Such a fill factor value would be better, since it creates an index smaller in size and hence queries can retrieve the required data with less disk I/O operations since it has to read less pages.
On the other hand if you have an index that is constantly changing you would want to have a lower value to keep some free space available for new index entries. Otherwise SQL Server would have to constantly do page splits to fit the new values into the index pages.
Q. How to make forcefully use an index in a query? Or What table hint needs to be specified to forcefully use an index in a query?
A: We can specify “Index” table hint in a query to forcefully use an index.
Ex: SELECT Emp_ID, Name FROM Emp WITH(INDEX(NIX_NAME_EMP))
Q. How to identify the correct (Clustered/ Non- Clustered) index on a column?
A: Here are the few dependent factor:
Selecting Clustered Index:
- Clustered indexes are ideal for queries that select by a range of values or where you need sorted results. Examples of this include when you are using BETWEEN, <, >, GROUP BY, ORDER BY, and aggregates such as MAX, MIN, and COUNT in your queries.
- Clustered indexes are good for queries that look up a record with a unique value (such as an employee number) and when you need to retrieve most or all of the data in the record.
- Clustered indexes are good for queries that access columns with a limited number of distinct values, such as columns that holds country or state data. But if column data has little distinctiveness, such as columns with a yes or no, or male or female, then these columns should not be indexed at all.
- Avoid putting a clustered index on columns that increment, such as an identity, date, or similarly incrementing columns, if your table is subject to a high level of INSERTS.
Selecting Non – Clustered Index:
- Non-clustered indexes are best for queries that return few rows (including just one row) and where the index has good selectivity (above 95%).
- If a column in a table is not at least 95% unique, then most likely the SQL Server Query Optimizer will not use a non-clustered index based on that column. For example, a column with “yes” or “no” as the data won’t be at least 95% unique.
- Keep the “width” of your indexes as narrow as possible, especially when creating composite (multi-column) indexes. This reduces the size of the index and reduces the number of reads required to read the index, boosting performance.
- If possible, try to create indexes on columns that have integer values instead of characters. Integer values have less overhead than character values.
- If you know that your application will be performing the same query over and over on the same table, consider creating a covering index on the table. A covering index includes all of the columns referenced in the query.
- An index is only useful to a query if the WHERE clause of the query matches the column(s) that are leftmost in the index. So if you create a composite index, such as “City, State”, then a query such as “WHERE City = ‘Houston'” will use the index, but the query “WHERE STATE = ‘TX'” will not use the index.
Q. What are the index statistics?
A: Index statistics contain information about the distribution of index key values. By distribution, I mean the number of rows associated with each key value. SQL Server uses this information to determine what kind of execution plan to use when processing a query.
Q. When are index statistics updated?
A: The AUTO_UPDATE_STATISTICS database setting controls when statistics are automatically updated. Once statistics have been created, SQL Server then determines when to update those statistics based on how out-of-date the statistics might be. SQL Server identifies out of date statistics based on the number of inserts, updates, and deletes that have occurred since the last time statistics were updated, and then recreates the statistics based on a threshold. The threshold is relative to the number of records in the table. (Enable the properties – “auto create statistics” and “auto update statistics” for OLTP)
Q. Explain database options “Auto Update Statistics” and “Auto Update Statistics Asynchronous”?
A:
Auto Update Statistics: If there is an incoming query but statistics are stale then sql server first update the statistics before building the execution plan.
Auto Update Statistics Asynchronous: If there is an incoming query but statistics are stale then sql servers uses the stale statistics, builds the execution plan and then update the statistics.
Q. How to find out when statistics updated last time?
A: A simple logic is, run the query and observe the values for both “estimated rows” and “actual rows”, if they both are close to each other you need not worried about the statistics. If you find big difference between them then you need to think about updating statistics.
In general we can find out last statistics updated info from below query:
select object_name(object_id) as table_name
,name as stats_name
,stats_date(object_id, stats_id) as last_update
from sys.stats
where objectproperty(object_id, ‘IsUserTable’) = 1
order by last_update
Q. What is an Indexed View?
A: Views allow you to create a virtual table by defining a query against one or more tables. With a standard view, the result is not stored in the database. Instead, the result set is determined at the time a query utilizing that view is executed.
Creating a unique clustered index on a view changes it to an indexed view. The clustered index is stored in SQL Server and updated like any other clustered index, providing SQL Server with another place to look to potentially optimize a query utilizing the indexed view.
Typically, environments that are best suited for indexed views are data warehouses, data marts, OLAP databases, and the like. Transactional environments are less suitable for indexed views.
Q. What are the best practice for index creation?
A: Here are few basic factors:
- Understand the characteristics of the database (OLTP / OLAP)
- Understand the characteristics of the most frequently used queries.
- Understand the characteristics of the columns used in the queries.
- Choose the right index at the right place. For example, creating a clustered index on an existing large table would benefit from the ONLINE index option.
- Determine the optimal storage location for the index. A non-clustered index can be stored in the same filegroup as the underlying table, or on a different filegroup. If those filegroups are in different physical drives it will improve the performance.
Database Considerations:
- Avoid over indexing
- Use many indexes to improve query performance on tables with low update requirements, but large volumes of data.
- Indexing small tables may not be optimal.
- Indexes on views can provide significant performance gains when the view contains aggregations and/or table joins. The view does not have to be explicitly referenced in the query for the query optimizer to use it.
Column considerations:
- Keep the length of the index key short for clustered indexes. Additionally, clustered indexes benefit from being created on unique or nonnull columns.
- Columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types cannot be specified as index key columns. However, varchar(max), nvarchar(max), varbinary(max), and xml data types can participate in a non-clustered index as nonkey index columns.
- Columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types cannot be specified as index key columns. However, varchar(max), nvarchar(max), varbinary(max), and xml data types can participate in a nonclustered index as nonkey index columns.
- Consider using filtered indexes on columns that have well-defined subsets.
- Consider the order of the columns if the index will contain multiple columns. The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first.
Index Characteristics:
- Determine the right index depends on the business need
- Clustered versus non-clustered
- Unique versus non-unique
- Single column versus multicolumn
- Ascending or descending order on the columns in the index
- Full-table versus filtered for non-clustered indexes
- Determine the Fill Factor
Q. Can we disable the Clustered index in a table? If yes, what will be the impact?
A: Yes, we can disable the clustered index in a table.
In result that will trigger the warnings and it will disable all Non-Clustered Indexes in a table, all relationship with reference tables.
Q. What are the different types of Indexes in SQL Server?
A: There main two types of indexes in SQL Server:
1. Clustered Index:A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values.
2. Non-Clustered Index:A non-clustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the non-clustered index contains the non-clustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table.
apart from these, here are some other types of indexes:
3. Unique Index:An index that ensures the uniqueness of each value in the indexed column. If the index is a composite, the uniqueness is enforced across the columns as a whole, not on the individual columns.A unique index is automatically created when you define a primary key or unique constraint:
--Primary key: When you define a primary key constraint on one or more columns, SQL Server automatically creates a unique, clustered index if a clustered index does not already exist on the table or view. However, you can override the default behaviour and define a unique, non-clustered index on the primary key.
--Unique: When you define a unique constraint, SQL Server automatically creates a unique, non-clustered index. You can specify that a unique clustered index be created if a clustered index does not already exist on the table.
4. Full Text Indexes:A special type of token-based functional index that is built and maintained by the Microsoft Full-Text Engine for SQL Server. It provides efficient support for sophisticated word searches in character string data.
5. Spatial Indexes: A spatial index provides the ability to perform certain operations more efficiently on spatial objects (spatial data) in a column of the geometry data type. The spatial index reduces the number of objects on which relatively costly spatial operations need to be applied.
6. Filtered Index: An optimized nonclustered index, especially suited to cover queries that select from a well-defined subset of data. It uses a filter predicate to index a portion of rows in the table. A well-designed filtered index can improve query performance, reduce index maintenance costs, and reduce index storage costs compared with full-table indexes. Use where clause in create index statement.
7. Composite index: An index that contains more than one column. In both SQL Server 2005 and 2008, you can include up to 16 columns in an index, as long as the index doesn’t exceed the 900-byte limit. Both clustered and non-clustered indexes can be composite indexes.
8. XML Indexes: A shredded, and persisted, representation of the XML binary large objects (BLOBs) in the xml data type column.
9. ColumnStore Index: Refer Post
Q. What is the Covering Index?
A: A covering index, which is a form of a composite non clustered index, includes all of the columns referenced in SELECT, JOIN, and WHERE clauses of a query. Because of this, the index contains the data you are looking for and SQL Server doesn’t have to look up the actual data in the table, reducing logical and/or physical I/O, and boosting performance.
On the other hand, if the covering index gets too big (has too many columns), this could actually increase I/O and degrade performance.
Q. What is Online Indexing?
A: Online Indexing is a new feature available from SQL Server 2005. In SQL Server 2005, DBAs can create, rebuild, or drop indexes online. The index operations on the underlying table can be performed concurrently with update or query operations. This was not possible in previous versions of SQL Server. In the past, indexing operations (reorganizing or rebuilding) were usually performed as a part of other maintenance tasks running during off-peak hours. During these offline operations, the indexing operations hold exclusive locks on the underlying table and associated indexes. During online index operations, SQL Server 2005 eliminates the need of exclusive locks.
The Online indexing feature is very helpful for environments that run 24 hours a day, seven days a week. The Online Indexing feature is available only in the Enterprise Edition of SQL Server 2005 and above.
Q. How Online Indexing works?
A: The online index operation can be divided into three phases:
-Preparation
-Build
-Final
The Build phase is a longest phase of all. It is in this phase where the creation, dropping, or rebuilding of indexes take place. The duration of the Build phase depends on the size of the data and the speed of the hardware. Exclusive locks are not held in this phase, so concurrent DML operations can be performed during this phase. The Preparation and Final phases are for shorter durations. They are independent of the size factor of the data. During these two short phases, the table or the indexed data is not available for concurrent DML operations.
Q. How will you identify unused indexes in a table?
A: You can identify using DMV sys.dm_db_index_usage_stats” which retrieves the statistics of indexes , if an index id is not in that dmv then we can say that index is not been using from a long time.
Q. What are the primary differences between an index reorganization and an index rebuild?
A: Here are few main differences:
-Reorganization is an “online” operation by default; a rebuild is an “offline” operation by default.
-Reorganization only affects the leaf level of an index.
-Reorganization swaps data pages in-place by using only the pages already allocated to the index; a rebuild uses new pages/allocations.
-Reorganization is always a fully-logged operation; a rebuild can be a minimally-logged operation.
-Reorganization can be stopped mid-process and all completed work is retained; a rebuild is transactional and must be completed in entirety to keep changes.
Q. What is fill factor? How to choose the fill factor while creating an index?
A: The Fill Factor specifies the % of fullness of the leaf level pages of an index. When an index is created or rebuilt the leaf level pages are written to the level where the pages are filled up to the fill factor value and the remainder of the page is left blank for future usage. This is the case when a value other than 0 or 100 is specified. For example, if a fill factor value of 70 is chosen, the index pages are all written with the pages being 70 % full, leaving 30 % of space for future usage.
Q. When to choose High or Low Fillfactor Value?
A: You might choose a high fill factor value when there is very little or no change in the underlying table’s data, such as a decision support system where data modification is not frequent, but on a regular and scheduled basis. Such a fill factor value would be better, since it creates an index smaller in size and hence queries can retrieve the required data with less disk I/O operations since it has to read less pages.
On the other hand if you have an index that is constantly changing you would want to have a lower value to keep some free space available for new index entries. Otherwise SQL Server would have to constantly do page splits to fit the new values into the index pages.
Q. How to make forcefully use an index in a query? Or What table hint needs to be specified to forcefully use an index in a query?
A: We can specify “Index” table hint in a query to forcefully use an index.
Ex: SELECT Emp_ID, Name FROM Emp WITH(INDEX(NIX_NAME_EMP))
Q. How to identify the correct (Clustered/ Non- Clustered) index on a column?
A: Here are the few dependent factor:
Selecting Clustered Index:
- Clustered indexes are ideal for queries that select by a range of values or where you need sorted results. Examples of this include when you are using BETWEEN, <, >, GROUP BY, ORDER BY, and aggregates such as MAX, MIN, and COUNT in your queries.
- Clustered indexes are good for queries that look up a record with a unique value (such as an employee number) and when you need to retrieve most or all of the data in the record.
- Clustered indexes are good for queries that access columns with a limited number of distinct values, such as columns that holds country or state data. But if column data has little distinctiveness, such as columns with a yes or no, or male or female, then these columns should not be indexed at all.
- Avoid putting a clustered index on columns that increment, such as an identity, date, or similarly incrementing columns, if your table is subject to a high level of INSERTS.
Selecting Non – Clustered Index:
- Non-clustered indexes are best for queries that return few rows (including just one row) and where the index has good selectivity (above 95%).
- If a column in a table is not at least 95% unique, then most likely the SQL Server Query Optimizer will not use a non-clustered index based on that column. For example, a column with “yes” or “no” as the data won’t be at least 95% unique.
- Keep the “width” of your indexes as narrow as possible, especially when creating composite (multi-column) indexes. This reduces the size of the index and reduces the number of reads required to read the index, boosting performance.
- If possible, try to create indexes on columns that have integer values instead of characters. Integer values have less overhead than character values.
- If you know that your application will be performing the same query over and over on the same table, consider creating a covering index on the table. A covering index includes all of the columns referenced in the query.
- An index is only useful to a query if the WHERE clause of the query matches the column(s) that are leftmost in the index. So if you create a composite index, such as “City, State”, then a query such as “WHERE City = ‘Houston'” will use the index, but the query “WHERE STATE = ‘TX'” will not use the index.
Q. What are the index statistics?
A: Index statistics contain information about the distribution of index key values. By distribution, I mean the number of rows associated with each key value. SQL Server uses this information to determine what kind of execution plan to use when processing a query.
Q. When are index statistics updated?
A: The AUTO_UPDATE_STATISTICS database setting controls when statistics are automatically updated. Once statistics have been created, SQL Server then determines when to update those statistics based on how out-of-date the statistics might be. SQL Server identifies out of date statistics based on the number of inserts, updates, and deletes that have occurred since the last time statistics were updated, and then recreates the statistics based on a threshold. The threshold is relative to the number of records in the table. (Enable the properties – “auto create statistics” and “auto update statistics” for OLTP)
Q. Explain database options “Auto Update Statistics” and “Auto Update Statistics Asynchronous”?
A:
Auto Update Statistics: If there is an incoming query but statistics are stale then sql server first update the statistics before building the execution plan.
Auto Update Statistics Asynchronous: If there is an incoming query but statistics are stale then sql servers uses the stale statistics, builds the execution plan and then update the statistics.
Q. How to find out when statistics updated last time?
A: A simple logic is, run the query and observe the values for both “estimated rows” and “actual rows”, if they both are close to each other you need not worried about the statistics. If you find big difference between them then you need to think about updating statistics.
In general we can find out last statistics updated info from below query:
select object_name(object_id) as table_name
,name as stats_name
,stats_date(object_id, stats_id) as last_update
from sys.stats
where objectproperty(object_id, ‘IsUserTable’) = 1
order by last_update
Q. What is an Indexed View?
A: Views allow you to create a virtual table by defining a query against one or more tables. With a standard view, the result is not stored in the database. Instead, the result set is determined at the time a query utilizing that view is executed.
Creating a unique clustered index on a view changes it to an indexed view. The clustered index is stored in SQL Server and updated like any other clustered index, providing SQL Server with another place to look to potentially optimize a query utilizing the indexed view.
Typically, environments that are best suited for indexed views are data warehouses, data marts, OLAP databases, and the like. Transactional environments are less suitable for indexed views.
Q. What are the best practice for index creation?
A: Here are few basic factors:
- Understand the characteristics of the database (OLTP / OLAP)
- Understand the characteristics of the most frequently used queries.
- Understand the characteristics of the columns used in the queries.
- Choose the right index at the right place. For example, creating a clustered index on an existing large table would benefit from the ONLINE index option.
- Determine the optimal storage location for the index. A non-clustered index can be stored in the same filegroup as the underlying table, or on a different filegroup. If those filegroups are in different physical drives it will improve the performance.
Database Considerations:
- Avoid over indexing
- Use many indexes to improve query performance on tables with low update requirements, but large volumes of data.
- Indexing small tables may not be optimal.
- Indexes on views can provide significant performance gains when the view contains aggregations and/or table joins. The view does not have to be explicitly referenced in the query for the query optimizer to use it.
Column considerations:
- Keep the length of the index key short for clustered indexes. Additionally, clustered indexes benefit from being created on unique or nonnull columns.
- Columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types cannot be specified as index key columns. However, varchar(max), nvarchar(max), varbinary(max), and xml data types can participate in a non-clustered index as nonkey index columns.
- Columns that are of the ntext, text, image, varchar(max), nvarchar(max), and varbinary(max) data types cannot be specified as index key columns. However, varchar(max), nvarchar(max), varbinary(max), and xml data types can participate in a nonclustered index as nonkey index columns.
- Consider using filtered indexes on columns that have well-defined subsets.
- Consider the order of the columns if the index will contain multiple columns. The column that is used in the WHERE clause in an equal to (=), greater than (>), less than (<), or BETWEEN search condition, or participates in a join, should be placed first.
Index Characteristics:
- Determine the right index depends on the business need
- Clustered versus non-clustered
- Unique versus non-unique
- Single column versus multicolumn
- Ascending or descending order on the columns in the index
- Full-table versus filtered for non-clustered indexes
- Determine the Fill Factor
Q. Can we disable the Clustered index in a table? If yes, what will be the impact?
A: Yes, we can disable the clustered index in a table.
In result that will trigger the warnings and it will disable all Non-Clustered Indexes in a table, all relationship with reference tables.
Warning: Foreign key 'FK_Contbl_EmpTbl' on table 'Contbl' referencing table 'EmpTbl' was disabled as a result of disabling the index 'PK_EmpTbl'. Warning: Index 'XI_FName' on table 'EmpTbl' was disabled as a result of disabling the clustered index on the table.When we execute SELECT command will get an error as below:
Msg 8655, Level 16, State 1, Line 1 The query processor is unable to produce a plan because the index 'PK_EmpTbl' on table or view 'emptbl' is disabled.
No comments:
Post a Comment