SQL Server Indexes Guide
What is a Table?
Tables can be thought of in either logical or physical terms.
Each table in a database can be broken into multiple components. Let’s take a brief look at each of them so we better understand what makes up a physical table within SQL Server.
Partitions contain data rows stored in either the physical form of a heap (a table without a clustered index) or a B-Tree structure (a table with a clustered index). We will go into a more detailed explanation of these structures in the next section.
Heap and B-Tree structures contain one or more allocation units. This is just a fancy way of subdividing different data types into three categories, which are:
In_Row_Data: This is the most common type of allocation unit, and is used to store data and index pages, except for Large Object (LOB) data. In other words, most of your row data (and related indexes) are stored in this type of allocation unit.
Lob_Data: Used to store Large Object (LOB) data, such as text, ntext, xml, image, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined data types.
Row_Overflow_Data: Used to store data pages when the variable data types—varchar, nvarchar, varbinary, and sql_variant data columns—that exceed the 8,060 bytes that can fit onto a single data page.
If a heap or Be-Tree structure does not have a need for all three allocations units, then they may or may not have them. Besides the data described above, each of these allocation units also include various metadata to help manage the data within each allocation unit.
What is a Heap?
A heap is simply a table without a clustered index. When rows are added to a heap, they are not stored in any particular order on a data page, and data pages are not stored in any particular sequence within a database. In other words, rows are stored wherever there is room available. This means that the data pages that contain the rows of the heap may be scattered throughout a database, in no particular order.
Note: A heap can have non-clustered indexes, but it does not have a clustered index.
Since a table can’t exist as a bunch of scattered pages, SQL Server provides a way to link them all together so that they act as a single table. This is done using what are called Index Allocation Map (IAM) pages. IAM pages manage the space allocated to heaps (among other tasks), and is what is used to connect the scattered pages (and their rows) into a table.
What is an Index?
An index is simply a way to help queries return data faster from tables. In SQL Server, all indexes physically take the form of what is called a B-Tree.
Notice that B-Trees have several levels. They include:
Root Level: A B-Tree starts with a single index page called the root level. This is where a query begins to look for specific data within the B-Tree. In our example, our root level page only contains two values. Most root level pages have many values, each referring to a page in the intermediate level.
Intermediate Level: Most B-Trees have one or more intermediate levels. The number of intermediate levels depends on the amount of data stored in the table being indexed. Our example has only one intermediate level, which includes two index pages.
Leaf Level: A B-Tree has a single leaf level which may include many, many data pages, depending on the amount of the data stored as part of the leaf level. The leaf level is where the data you are looking for is stored. For example, in the leaf level page that starts with “1”, all the rows with a customer number that ranges from “1” through “2499” are located. On the leaf level page that begins with “2500”, all the rows that range from “2500” to “4999” are located, and so on.
Types of Indexes
1. Clustered:A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values.
2. Non-Clustered:A nonclustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table.
3. XML:XML data in SQL Server is generally stored in XML data type columns in the form of Large Binary Objects (LOBs) that can reach up to 2 GB in size. If you run a query against an XML data column (without an index), the data has to be shredded before the query can parse the data to see if there is a match. Shredding, which means converting the XML into a relational format, can be a time-consuming process that greatly hinders query performance.
One way to avoid on-the-fly shredding, and to speed up queries against XML data, is to index your XML data columns. SQL Server offers two different types of XML indexes.
--Primary XML Index: When an XML index is created, the XML is shredded and materialized in a physical form. In many ways, the resulting index is similar in concept to an indexed view. In other words, the XML stored as a BLOB is converted to a relational form that can easily be indexed and queried. If the XML data is large, adding a primary XML index can take up a lot of space because all of the XML data is physically stored in its shredded form.
--Secondary XML Index: There are three different types of secondary XML indexes: PATH, VALUE, and PROPERTY. These are created on the primary XML index, and in many ways, are similar to adding a non-clustered index to an indexed view.
4. Filtered Index:A filtered index is a variation of a non-clustered index. While a non-clustered index indexes all of the rows in a table, a filtered index only indexes those rows that you want indexed. In other words, you choose what rows you want indexed (using a WHERE clause in your CREATE NONCLUSTERED INDEX command), and only those rows that match your WHERE clause will be included as part of the filtered index.Its new from SQL Server 2008.
Filtered indexes are best used on columns that can contain easily be divided into well-defined subsets. For example:
--Columns that contain few values, but many NULL, such as in the example above.
--Columns that have heterogeneous data, such as a column that stores the colour of a product. For example, green, red, blue, and so on.
--Columns that contain ranges of values, such as time, date, money, and so on.
--Columns that have data that can be easily defined using comparison logic within a WHERE clause.
5. Spatial Index:Spatial indexes can be created on either geometry or geography spatial columns. Before a spatial index can be created as a B-Tree structure, spatial data has to be decomposed into a grid hierarchy. This is because a spatial index refers to finite space. For example, an index on a geometry data column could map an area on a plane, and an index on a geography column could map geographic data to a two-dimensional space.Its new from SQL Server 2008.
6. ColumnStore Indexes:An in-memory columnstore index stores and manages data by using column-based data storage and column-based query processing. This type of Index is introduced from SQL 2012.
Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.
There are two types of ColumnStore Indexes:
--NonClustered ColumnStore Indexes(NonCCI):
Non-CCI is introduced in SQL 2012. It is Read only(which means non-Updateable) and require separate heap and Cluster index before create it.
--Clustered ColumnStore Indexes (CCI): CCI is introduced in SQL 2014. It is updateable and this is achieve by adding the row group into the data. New inserted row will be place in the new row group until Reorg.
We will check in the next post about ColumnStore Index in more details.
Compression
In SQL Server 2008, there is no special “compressed” index type. What I am referring to in this section is the ability in SQL Server 2008 (Enterprise Edition) to perform compression on clustered indexes, non-clustered indexes, and indexed views.
Compression comes in two forms:
--Row-level Data Compression: Row-level data compression is essentially turning fixed length data types into variable length data types, freeing up empty space. It also has the ability to ignore zero and null values, saving additional space. In turn, more rows can fit into a single data page.
--Page-level Data Compression: Page-level data compression starts with row-level data compression, then adds two additional compression features: prefix and dictionary compression, providing for greater compression than row-level compression alone.
While two types of data compression are available, this is only available for the leaf level pages of an index. The root and intermediate levels of an index can only be compressed using row-level compression.
What is a Table?
Tables can be thought of in either logical or physical terms.
Each table in a database can be broken into multiple components. Let’s take a brief look at each of them so we better understand what makes up a physical table within SQL Server.
Partitions contain data rows stored in either the physical form of a heap (a table without a clustered index) or a B-Tree structure (a table with a clustered index). We will go into a more detailed explanation of these structures in the next section.
Heap and B-Tree structures contain one or more allocation units. This is just a fancy way of subdividing different data types into three categories, which are:
In_Row_Data: This is the most common type of allocation unit, and is used to store data and index pages, except for Large Object (LOB) data. In other words, most of your row data (and related indexes) are stored in this type of allocation unit.
Lob_Data: Used to store Large Object (LOB) data, such as text, ntext, xml, image, varchar(max), nvarchar(max), varbinary(max), and CLR user-defined data types.
Row_Overflow_Data: Used to store data pages when the variable data types—varchar, nvarchar, varbinary, and sql_variant data columns—that exceed the 8,060 bytes that can fit onto a single data page.
If a heap or Be-Tree structure does not have a need for all three allocations units, then they may or may not have them. Besides the data described above, each of these allocation units also include various metadata to help manage the data within each allocation unit.
What is a Heap?
A heap is simply a table without a clustered index. When rows are added to a heap, they are not stored in any particular order on a data page, and data pages are not stored in any particular sequence within a database. In other words, rows are stored wherever there is room available. This means that the data pages that contain the rows of the heap may be scattered throughout a database, in no particular order.
Note: A heap can have non-clustered indexes, but it does not have a clustered index.
Since a table can’t exist as a bunch of scattered pages, SQL Server provides a way to link them all together so that they act as a single table. This is done using what are called Index Allocation Map (IAM) pages. IAM pages manage the space allocated to heaps (among other tasks), and is what is used to connect the scattered pages (and their rows) into a table.
What is an Index?
An index is simply a way to help queries return data faster from tables. In SQL Server, all indexes physically take the form of what is called a B-Tree.
Notice that B-Trees have several levels. They include:
Root Level: A B-Tree starts with a single index page called the root level. This is where a query begins to look for specific data within the B-Tree. In our example, our root level page only contains two values. Most root level pages have many values, each referring to a page in the intermediate level.
Intermediate Level: Most B-Trees have one or more intermediate levels. The number of intermediate levels depends on the amount of data stored in the table being indexed. Our example has only one intermediate level, which includes two index pages.
Leaf Level: A B-Tree has a single leaf level which may include many, many data pages, depending on the amount of the data stored as part of the leaf level. The leaf level is where the data you are looking for is stored. For example, in the leaf level page that starts with “1”, all the rows with a customer number that ranges from “1” through “2499” are located. On the leaf level page that begins with “2500”, all the rows that range from “2500” to “4999” are located, and so on.
Types of Indexes
1. Clustered:A clustered index sorts and stores the data rows of the table or view in order based on the clustered index key. The clustered index is implemented as a B-tree index structure that supports fast retrieval of the rows, based on their clustered index key values.
2. Non-Clustered:A nonclustered index can be defined on a table or view with a clustered index or on a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value. The rows in the index are stored in the order of the index key values, but the data rows are not guaranteed to be in any particular order unless a clustered index is created on the table.
3. XML:XML data in SQL Server is generally stored in XML data type columns in the form of Large Binary Objects (LOBs) that can reach up to 2 GB in size. If you run a query against an XML data column (without an index), the data has to be shredded before the query can parse the data to see if there is a match. Shredding, which means converting the XML into a relational format, can be a time-consuming process that greatly hinders query performance.
One way to avoid on-the-fly shredding, and to speed up queries against XML data, is to index your XML data columns. SQL Server offers two different types of XML indexes.
--Primary XML Index: When an XML index is created, the XML is shredded and materialized in a physical form. In many ways, the resulting index is similar in concept to an indexed view. In other words, the XML stored as a BLOB is converted to a relational form that can easily be indexed and queried. If the XML data is large, adding a primary XML index can take up a lot of space because all of the XML data is physically stored in its shredded form.
--Secondary XML Index: There are three different types of secondary XML indexes: PATH, VALUE, and PROPERTY. These are created on the primary XML index, and in many ways, are similar to adding a non-clustered index to an indexed view.
4. Filtered Index:A filtered index is a variation of a non-clustered index. While a non-clustered index indexes all of the rows in a table, a filtered index only indexes those rows that you want indexed. In other words, you choose what rows you want indexed (using a WHERE clause in your CREATE NONCLUSTERED INDEX command), and only those rows that match your WHERE clause will be included as part of the filtered index.Its new from SQL Server 2008.
Filtered indexes are best used on columns that can contain easily be divided into well-defined subsets. For example:
--Columns that contain few values, but many NULL, such as in the example above.
--Columns that have heterogeneous data, such as a column that stores the colour of a product. For example, green, red, blue, and so on.
--Columns that contain ranges of values, such as time, date, money, and so on.
--Columns that have data that can be easily defined using comparison logic within a WHERE clause.
5. Spatial Index:Spatial indexes can be created on either geometry or geography spatial columns. Before a spatial index can be created as a B-Tree structure, spatial data has to be decomposed into a grid hierarchy. This is because a spatial index refers to finite space. For example, an index on a geometry data column could map an area on a plane, and an index on a geography column could map geographic data to a two-dimensional space.Its new from SQL Server 2008.
6. ColumnStore Indexes:An in-memory columnstore index stores and manages data by using column-based data storage and column-based query processing. This type of Index is introduced from SQL 2012.
Columnstore indexes work well for data warehousing workloads that primarily perform bulk loads and read-only queries. Use the columnstore index to achieve up to 10x query performance gains over traditional row-oriented storage, and up to 7x data compression over the uncompressed data size.
There are two types of ColumnStore Indexes:
--NonClustered ColumnStore Indexes(NonCCI):
Non-CCI is introduced in SQL 2012. It is Read only(which means non-Updateable) and require separate heap and Cluster index before create it.
--Clustered ColumnStore Indexes (CCI): CCI is introduced in SQL 2014. It is updateable and this is achieve by adding the row group into the data. New inserted row will be place in the new row group until Reorg.
We will check in the next post about ColumnStore Index in more details.
Compression
In SQL Server 2008, there is no special “compressed” index type. What I am referring to in this section is the ability in SQL Server 2008 (Enterprise Edition) to perform compression on clustered indexes, non-clustered indexes, and indexed views.
Compression comes in two forms:
--Row-level Data Compression: Row-level data compression is essentially turning fixed length data types into variable length data types, freeing up empty space. It also has the ability to ignore zero and null values, saving additional space. In turn, more rows can fit into a single data page.
--Page-level Data Compression: Page-level data compression starts with row-level data compression, then adds two additional compression features: prefix and dictionary compression, providing for greater compression than row-level compression alone.
While two types of data compression are available, this is only available for the leaf level pages of an index. The root and intermediate levels of an index can only be compressed using row-level compression.
No comments:
Post a Comment