Data and AI Knowledge: SQL Server : Data Compression

Today's Post is all about Data Compression.
SQL Server’s data compression doesn't automatically compress the data in the database; instead, data compression can only be used for these types of database objects:

- A table stored as a heap
- A table stored as a clustered index
- A non-clustered indexes.
- An indexed view.
- Partitioned tables and indexes.

There are two different types of Data Compressions:
Row‐Level Data Compression
Row-level compression is essentially turning fixed length data types into variable length data types, freeing up empty space. It also has the ability to ignore zero and null values, saving additional space. In turn, more rows can fit into a single data page.

This is the simplest method of data compression is row-level compression, works by:
- Reducing the amount of metadata used to store a row.
- Storing fixed length numeric data types as if they were variable-length data types. For example, if you store the value 1 in a bigint data type, storage will only take 1 byte, not 8 bytes, which the bigint data types normally takes.
- Storing CHAR data types as variable-length data types. For example, if you have a CHAR (100) data type, and only store 10 characters in it, blank characters are not stored, thus reducing the space needed to the store data.
- Not storing NULL or 0 values.
Row-level data compression offers less compression than page-level data compression, but it also incurs less overhead, reducing the amount of CPU resources required to implement it.

Page Level Data Compression
Page Level Compression includes three additional features:
- Row Level Compression.
- Prefix Compression.
- Dictionary Compression.
Its starts with row-level data compression, then other two additional compression features: prefix and dictionary compression. We'll get more detail about these features later in this article. As you can imagine, page-level compression offers better data compression in compare to row-level compression.

Page-level data compression offers greater compression, but at the expense of greater CPU utilization. It works using these techniques:
- It starts out by using row-level data compression to get as many rows as it can on a single page.
- Next, prefix compression is run. Essentially, repeating patterns of data at the beginning of the values of a given column are removed and substituted with an abbreviated reference that is stored in the compression information (CI) structure that immediately follows the page header of a data page.
- And last, dictionary compression is used. Dictionary compression searches for repeated values anywhere on a page and stores them in the CI. One of the major differences between prefix and dictionary compression is that prefix compression is restricted to one column, while dictionary compression works anywhere on a data page.

The amount of compression provided by page-level data compression is highly dependent on the data stored in a table or index. If a lot of the data repeats itself, then compression is more efficient. If the data is more random, then little benefits can be gained using page-level compression.

Advantages
- Reducing the amount of physical disk space required to store data in result less number of reads and writes. - Reduces the amount of Buffer Cache memory needed to store data in the Buffer Cache. This in turn allows more data to be stored in the Buffer Cache, reducing the need for SQL Server to access the disk to get data, as the data is now more likely to be in memory than disk, further reducing disk I/O.

Disadvantages
- Using compression uses up additional CPU cycles. If your server has plenty to spare, then you have no problem. But if your server is already experiencing a CPU bottleneck, then perhaps compression is better left turned off.

Steps to enable Compression on a table or index refer MSDN Article

Here are the T-SQL syntax with example of compression:
To enable compression on a table:

USE AdventureWorks2012;
GO
EXEC sp_estimate_data_compression_savings 'Production', 'TransactionHistory', NULL, NULL, 'ROW' ;

ALTER TABLE Production.TransactionHistory REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = ROW); 
GO

To enable compression on an index:

USE AdventureWorks2012; 
GO
SELECT name, index_id
FROM sys.indexes
WHERE OBJECT_NAME (object_id) = N'TransactionHistory';

EXEC sp_estimate_data_compression_savings 
    @schema_name = 'Production', 
    @object_name = 'TransactionHistory',
    @index_id = 2, 
    @partition_number = NULL, 
    @data_compression = 'PAGE' ; 

ALTER INDEX IX_TransactionHistory_ProductID ON Production.TransactionHistory REBUILD PARTITION = ALL 
WITH (DATA_COMPRESSION = PAGE);
GO

Customize Script to get the compression details of all the tables in a database with estimated saving size:

CREATE PROCEDURE usp_CompressTbl_Report (@compress_type char(4))
AS 
SET NOCOUNT ON
BEGIN
DECLARE @schema_name sysname, @table_name sysname
CREATE TABLE #CompressTbl_Report 
(ObjName sysname,
schemaName sysname,
indx_ID int,
partit_number int,
size_with_current_compression_setting bigint,
size_with_requested_compression_setting bigint,
sample_size_with_current_compression_setting bigint,
sample_size_with_requested_compression_setting bigint)
DECLARE c_sch_tb_crs cursor for 
SELECT TABLE_SCHEMA,TABLE_NAME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE LIKE 'BASE%' 
AND TABLE_CATALOG = upper(db_name())
OPEN c_sch_tb_crs
FETCH NEXT FROM c_sch_tb_crs INTO @schema_name, @table_name
WHILE @@Fetch_Status = 0 
BEGIN
INSERT INTO #CompressTbl_Report
EXEC sp_estimate_data_compression_savings
@schema_name = @schema_name,
@object_name = @table_name,
@index_id = NULL,
@partition_number = NULL,
@data_compression = @compress_type
FETCH NEXT FROM c_sch_tb_crs INTO @schema_name, @table_name
END
CLOSE c_sch_tb_crs 
DEALLOCATE c_sch_tb_crs
SELECT schemaName AS [schema_name]
, ObjName AS [table_name]
, avg(size_with_current_compression_setting) as avg_size_with_current_compression_setting
, avg(size_with_requested_compression_setting) as avg_size_with_requested_compression_setting
, avg(size_with_current_compression_setting - size_with_requested_compression_setting) AS avg_size_saving
FROM #CompressTbl_Report
GROUP BY schemaName,ObjName
ORDER BY schemaName ASC, avg_size_saving DESC 
DROP TABLE #CompressTbl_Report
END
SET NOCOUNT OFF
GO

Here is a sample execution and result set:

USE AdventureWorks2012 
GO
EXEC usp_CompressTbl_Report @compress_type = 'ROW'
GO

Data and AI Knowledge

Tiles

Thursday, 10 September 2015

SQL Server : Data Compression

No comments:

Post a Comment