Wednesday 1 January 2020

SQL Server Database Partitioning

Today, We will be going with the example of SQL Database table Partitioning.
First we have to identify on the bases of which column we are going to partition the rows. mostly rows are partition on the bases of DateTime data type column which we are going to implement today.
Here the case is, we are creating table partition on basis of DateTime column in existing running database. Follow the below steps:
1. Add new file groups to existing database. So You could add files in multiple file groups on the same disk or on different disks. If you plan on using multiple aligned tables and data retrieval statements will almost always affect a small subset of rows in each table that could be grouped in partitions you should consider using multiple disk arrays. However, if you anticipate many SELECT statements retrieving a large majority of rows from each table, then performing input / output operations against multiple disk arrays could actually worsen performance. In such a case, you can still benefit from partitioning but should try to create multiple filegroups on the same disk array.
The following statements create filegroups in DBADB001 database:
ALTER DATABASE DBADB001 ADD FILEGROUP [FG_2012]  
GO  
ALTER DATABASE DBADB001 ADD FILEGROUP [FG_2013]  
GO  
ALTER DATABASE DBADB001 ADD FILEGROUP [FG_2014]  
GO  
ALTER DATABASE DBADB001 ADD FILEGROUP [FG_2015]  
GO
Each filegroup can have one or multiple files associated with it. 2. Add one file to each filegroup so that you can store partition data in each filegroup:
ALTER DATABASE DBADB001
  ADD FILE
  (NAME = N'DataFG_2012',
  FILENAME = N'D:\mssql\mssqlserver\mdf\DataFG_2012.ndf',
  SIZE = 50MB,
  MAXSIZE = 100MB,
  FILEGROWTH = 5MB)
  TO FILEGROUP [FG_2012]  
GO  
ALTER DATABASE DBADB001
  ADD FILE
  (NAME = N'DataFG_2013',
  FILENAME = N'D:\mssql\mssqlserver\mdf\DataFG_2013.ndf',
  SIZE = 50MB,
  MAXSIZE = 100MB,
  FILEGROWTH = 5MB)
  TO FILEGROUP [FG_2013]  
GO 
ALTER DATABASE DBADB001
  ADD FILE
  (NAME = N'DataFG_2014',
  FILENAME = N'D:\mssql\mssqlserver\mdf\DataFG_2014.ndf',
  SIZE = 50MB,
  MAXSIZE = 100MB,
  FILEGROWTH = 5MB)
  TO FILEGROUP [FG_2014]  
GO 
ALTER DATABASE DBADB001
  ADD FILE
  (NAME = N'DataFG_2015',
  FILENAME = N'D:\mssql\mssqlserver\mdf\DataFG_2015.ndf',
  SIZE = 50MB,
  MAXSIZE = 100MB,
  FILEGROWTH = 5MB)
  TO FILEGROUP [FG_2015]  
GO 
The CREATE TABLE statement normally specifies a particular filegroup on which the table is built. However, with SQL Server 2005 and later, you can reference the partition scheme as opposed to a filegroup, because you can spread each table across multiple filegroups. Partition functions commonly reference a column with DATETIME data type. This makes sense if you want to spread the data based on its creation timeframe. Data warehouse fact tables typically don't contain the DATETIME column. Instead, they normally include a foreign key referencing the date and time value in the time dimension (or, more accurately, the date dimension). Important point to keep in mind is that you're not limited to columns with DATETIME data type for partitioning keys. You could use the INTEGER data type key referencing the date dimension. However, if you use the integer values for partitioning and you want to partition based on date, then your application must ensure that records for a given week, month, quarter or year (depending on your implementation) must fall into certain ranges of integers.
3. Create the new table and will insert random data about 5000 rows.
use DBADB001
GO

CREATE TABLE [dbo].[TestTable] 
([pkcol] [int] NOT NULL,
 [Int1] [int] NULL,
 [Int2] [int] NULL,
 [TestName] [varchar](50) NULL,
 [partitioncol] DateTime)
GO

ALTER TABLE dbo.TestTable ADD CONSTRAINT PK_TestTable PRIMARY KEY CLUSTERED (pkcol) 
GO
CREATE NONCLUSTERED INDEX IX_TABLE1 ON dbo.TestTable (Int1,Int2)
  WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
        ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
  ON [PRIMARY]
GO
-- Populate table data
DECLARE @val INT
SELECT @val=1
WHILE @val < 5001
BEGIN  
   INSERT INTO dbo.TestTable(pkcol, Int1, Int2, TestName, partitioncol) 
      VALUES (@val,@val,@val,'TEST',getdate()-@val)
   SELECT @val=@val+1
END
GO

--To get table rows count
SELECT o.name objectname,i.name indexname, partition_id, partition_number, [rows]
FROM sys.partitions p
INNER JOIN sys.objects o ON o.object_id=p.object_id
INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id
WHERE o.name LIKE '%TestTable%'
4. Now Create a Partition Function:
CREATE PARTITION FUNCTION DateRangePFunc (DATETIME) AS
RANGE LEFT FOR VALUES 
('20121231 23:59:59.997', 
'20131231 23:59:59.997',
'20141231 23:59:59.997',
'20151231 23:59:59.997'
)
GO
5. Now Create Partition Scheme, which takes advantage of the partition function and maps each data range to different filegroups:
CREATE PARTITION SCHEME DateRangePScheme  AS
  PARTITION DateRangePFunc  TO 
  ([FG_2012],
  [FG_2013],
  [FG_2014],
  [FG_2015],
  [PRIMARY]  )
Using this partition scheme, all records with FullDate value prior to December 31, 2011 will be placed on [FG_2011] file group; values between January 1st, 2012 and December 31st, 2012 will be place on [FG_2012] and so forth. Any records for which FullDate column has a value after December 31st, 2015 will be placed on the PRIMARY file group.
Note that the "SaveAll" filegroup has to be specified. In this case, the PRIMARY file group for records that do not fall into any explicitly defined date ranges. Normally as a best practice, you should reserve the PRIMARY filegroup for system objects and have a separate SaveAll filegroup for user data.
Now that there is a partition function and scheme, you can create a partitioned table. The syntax is very similar to any other CREATE TABLE statement except it references the partition scheme instead of a referencing filegroup:
CREATE TABLE [dbo].[TestTable_PT] 
([pkcol] [int] NOT NULL,
 [Int1] [int] NULL,
 [Int2] [int] NULL,
 [TestName] [varchar](50) NULL,
 [partitioncol] DateTime)
  ON DateRangePScheme  (partitioncol)
GO
Now that the table has been created on a partition scheme, populate it based on the existing table "TestTable" and subsequently examine the row count in each partition:
/* normally you should avoid using "SELECT *" construct.
   But since this is an example of populating a sample table
   "SELECT *" won't cause any performance or usability issues.
   In production environments you will typically use BULK INSERT
   statement to populate partitioned tables based on flat files.  
*/
INSERT TestTable_PT
SELECT *
FROM   TestTable
       
/* next we can use $PARTITION function to retrieve row counts
  for each partition:  
*/
DECLARE @TableName NVARCHAR(200) = N'dbo.TestTable_PT'
 
SELECT SCHEMA_NAME(o.schema_id) + '.' + OBJECT_NAME(i.object_id) AS [object]
     , p.partition_number AS [p#]
     , fg.name AS [filegroup]
     , p.rows
     , au.total_pages AS pages
     , CASE boundary_value_on_right
       WHEN 1 THEN 'less than'
       ELSE 'less than or equal to' END as comparison
     , rv.value
     , CONVERT (VARCHAR(6), CONVERT (INT, SUBSTRING (au.first_page, 6, 1) +
       SUBSTRING (au.first_page, 5, 1))) + ':' + CONVERT (VARCHAR(20),
       CONVERT (INT, SUBSTRING (au.first_page, 4, 1) +
       SUBSTRING (au.first_page, 3, 1) + SUBSTRING (au.first_page, 2, 1) +
       SUBSTRING (au.first_page, 1, 1))) AS first_page
FROM sys.partitions p
INNER JOIN sys.indexes i
     ON p.object_id = i.object_id
AND p.index_id = i.index_id
INNER JOIN sys.objects o
     ON p.object_id = o.object_id
INNER JOIN sys.system_internals_allocation_units au
     ON p.partition_id = au.container_id
INNER JOIN sys.partition_schemes ps
     ON ps.data_space_id = i.data_space_id
INNER JOIN sys.partition_functions f
     ON f.function_id = ps.function_id
INNER JOIN sys.destination_data_spaces dds
     ON dds.partition_scheme_id = ps.data_space_id
     AND dds.destination_id = p.partition_number
INNER JOIN sys.filegroups fg
     ON dds.data_space_id = fg.data_space_id
LEFT OUTER JOIN sys.partition_range_values rv
     ON f.function_id = rv.function_id
     AND p.partition_number = rv.boundary_id
WHERE i.index_id < 2
     AND o.object_id = OBJECT_ID(@TableName);
Results:
PartitionID Row_count
1 1013000
2 2677000
3 24443000
4 32265000

Notice that partition identifiers start at 1. The "catchall" partition located on PRIMARY file group will have partition id equal to 5. This partition isn't retrieved by the above statement because it is empty - it doesn't have any records. You could retrieve all records for a particular partition identifier using the following syntax, again using $PARTITION function:
SELECT * FROM dbo.FactInternetSales_Partitioned
  WHERE $Partition.FullOrderDateRangePFN (FullDate) = 2
Now if you create an index on this table; by default the index will be partitioned using the same partition scheme as the table:
CREATE INDEX ix_FactInternetSales_Partitioned_cl
  ON FactInternetSales_Partitioned (   ProductKey,   FullDate)
  ON FullOrderDateRangePScheme  (FullDate)
It is possible to omit the partition scheme specification in this statement if you want the index to be aligned with the table. Note that an index doesn't have to use the same partition scheme and partition function to be aligned with the table. As long as an index is partitioned based on the same data type, has the same number of partitions as the table and each partition has the same data boundaries as the table, the index will be aligned.
Partitioning key of an index doesn't have to be part of the index key. So you could use the same partition scheme to create an index that does not reference FullDate as its index key. For example, the following statement is valid:
CREATE INDEX ix_FactInternetSales_Partitioned_ProductKey
  ON FactInternetSales_Partitioned (   ProductKey)
  ON FullOrderDateRangePScheme  (FullDate)
Use a partition strategy for an index which is completely different from the underlying table's partition scheme and partition function. Some examples of where this makes sense are:
The table isn't partitioned, but you wish to partition the index.
The table is partitioned but you would like to collocate the index data with other tables' indexes because these tables will be frequently joined.
You have a unique index you wish to partition and the index key isn't part of the table's partition function.
Table and index partition metadata can be retrieved for FactInternetSales_partitioned table using the following statement:
SELECT OBJECT_NAME([object_id]) AS table_name, *
  FROM sys.partitions
  WHERE [object_id] = OBJECT_ID('dbo.FactInternetSales_partitioned')
  ORDER BY index_id, partition_number
Results:
table_name partition_id object_id index_id partition_number hobt_id rows
FactInternetSales_Partitioned 72057594052411300 1294627655 0 1 72057594052411300 2026000
FactInternetSales_Partitioned 72057594052476900 1294627655 0 2 72057594052476900 5354000
FactInternetSales_Partitioned 72057594052542400 1294627655 0 3 72057594052542400 48886000
FactInternetSales_Partitioned 72057594052608000 1294627655 0 4 72057594052608000 64530000
FactInternetSales_Partitioned 72057594052673500 1294627655 0 5 72057594052673500 0
FactInternetSales_Partitioned 72057594052739000 1294627655 3 1 72057594052739000 2026000
FactInternetSales_Partitioned 72057594052804600 1294627655 3 2 72057594052804600 5354000
FactInternetSales_Partitioned 72057594052870100 1294627655 3 3 72057594052870100 48886000
FactInternetSales_Partitioned 72057594052935600 1294627655 3 4 72057594052935600 64530000
FactInternetSales_Partitioned 72057594053001200 1294627655 3 5 72057594053001200 0
It's important to note that in order to enable partition switching, all indexes on the table must be aligned. Partitioned indexes are normally implemented for query performance benefits, whereas partition switching yields great benefits in large table's manageability. So whether you align indexes with underlying tables depends on whether you are implementing table and index partitions primarily for performance tuning or for manageability.

No comments:

Post a Comment