Partitions in SQL Server: Creating a Partitioned Fact Table (by date)
I want to demonstrate a simple, yet common style partitioned fact table. I’ll also include here some handy bits of SQL to help along the way.
I started looking at partitions, and the syntax behind the commands in some previous posts. If you are new to partitioning in SQL Server,it’s worth checking them out. If you just want to create a new partitioned fact table, read ahead.
- Partitions in SQL Server: Creating a Partitioned Table
- Partitions in SQL Server: Function Definitions, LEFT vs RIGHT
Outline
The Fact Table
For this article, I’ll have a very generic and (for the purpose of demonstration) small fact table. It has a date (on which we want to partition the data) and some extra attributes and metrics. It represents a simple key value pair for attribute values for each day:
CREATE TABLE [dbo].[FactTable] ( [Date] DATE, [Attribute] VARCHAR(100), [Metric] BIGINT );
Date is used here to partition as it has the advantages of:
- Simplifying insertions (new data for each period is added to a new partition).
- Deletions (in the event of reprocessing) can be done easily/quickly. Especially post 2016, where a partition can be truncated.
- When querying fact tables, metrics are frequently rolled up for date periods, or only specific date ranges are required.
Components of a Partitioned Table
- Partition Function: The definition of how data is to be split. It includes the data type, and the value ranges to use in each partition.
- Partition Scheme: The definition of how a partition function is to be applied to data files. This allows DBAs to split data across logical storage locations if required, however in most modern environments with large SANs most SQL Server implementations and their DBAs will just use ‘primary’.
- Table & Index Definition: The last part is the definition of the table, including the partition scheme to use, and the index with which to apply it.
Creating The Partitioned Table
Partition Function
For this dataset, I’m going to add a new partition for each month (functions can be for any time frame you chose):
-------------------------------------------------------------------------------- -- Partition Function -- Using RIGHT (so the value given is the LOWER value) -------------------------------------------------------------------------------- CREATE PARTITION FUNCTION [FactTable_PF] (DATE) -- Datatype to use AS RANGE RIGHT -- See previous post FOR VALUES -- Values on which to 'split' the data ( N'2018-01-01T00:00:00', -- This will contain all data for 2018 Jan onwards (until the next partition). All data before then goes into a 'catch all' partition N'2018-02-01T00:00:00', -- This partition will contain all data from Feb 2018 onwards (unless a subsequent partition is added in the future) );
Partition Scheme
Next, we create the scheme to tell SQL Server how to organise the data on the disk(s):
-------------------------------------------------------------------------------- -- Partition Scheme -------------------------------------------------------------------------------- CREATE PARTITION SCHEME [FactTable_PS] AS PARTITION [FactTable_PF] ALL TO ([primary]) ;
Table & Index
A table should always (OK, so there might be exceptions) have a clustered index. Either as an explicit primary key, or just a plain clustered index. A fact table is no different. The clustered index is where the performance can be lifted with partitions.
Below I’m creating two variations on the fact table, one with a classic Primary Key, the other using a Clustered Index. Both will have the same result of partitioning the data using the scheme defined above.
-------------------------------------------------------------------------------- -- Fact Table -- Using PK -------------------------------------------------------------------------------- CREATE TABLE [dbo].[FactTable_1] ( [Date] DATE, [Attribute] VARCHAR(100), [Metric] BIGINT, CONSTRAINT [PK_FactTable] PRIMARY KEY ([Date], [Attribute]) -- optional PK Constraint ) ON [FactTable_PS] ([Date]) WITH -- Optional compression, worth it on large tables ( DATA_COMPRESSION = PAGE ) ; -------------------------------------------------------------------------------- -- Fact Table -- Using CLUSTERED INDEX -------------------------------------------------------------------------------- CREATE TABLE [dbo].[FactTable_2] ( [Date] DATE, [Attribute] VARCHAR(100), [Metric] BIGINT ) ON [FactTable_PS] ([Date]) WITH -- Optional compression, worth it on large tables ( DATA_COMPRESSION = PAGE ) ; CREATE CLUSTERED INDEX [CIDX_FactTable_2] ON [dbo].[FactTable_2] ([Date]) ON [FactTable_PS] ([Date]) ;
Adding More Partitions
So, from the definition of the function (and therefore scheme) above, we can see that only two partitions have been declared. As we are using the same time frame for each partition (not a requirement, and there are arguments for having different durations for different partitions), We can programmatically add new partitions:
-------------------------------------------------------------------------------- -- Add New Partitions (for date range) -- From last date in table, up to latest -------------------------------------------------------------------------------- DECLARE @CurrentDate DATE= GetDate(); DECLARE @ExistingMax DATE=(SELECT MAX(CONVERT(DATE,Value)) FROM sys.partition_functions f INNER JOIN sys.partition_range_values r ON f.function_id = r.function_id WHERE f.name = 'FactTable_PF' -- Function Name ); PRINT ' Date Now: ' + CONVERT(VARCHAR(10), @CurrentDate, 120); PRINT 'Max Partition Date: ' + CONVERT(VARCHAR(10), @ExistingMax, 120); IF @ExistingMax < @CurrentDate BEGIN DECLARE @lastDate DATE; DECLARE @processingDate DATE = @ExistingMax; WHILE @processingDate <= @CurrentDate BEGIN SET @lastDate = @processingDate; SET @processingDate = DateAdd(MONTH, 1, DateAdd(month, DateDiff(month, 0, @processingDate), 0)) -- Note: See date processing PRINT 'Adding partition for: ' + CONVERT(VARCHAR(10), @processingDate, 120); BEGIN TRANSACTION; ALTER PARTITION SCHEME FactTable_PS NEXT USED [PRIMARY] ALTER PARTITION FUNCTION FactTable_PF() SPLIT RANGE (@processingDate) COMMIT TRANSACTION; END; END;
Testing Our Fact Table
To test our fact table(s), we first generate some dummy data, insert it to te tables, then view the metadata for the partitions.
Dummy Data
-------------------------------------------------------------------------------- -- Create Date List -------------------------------------------------------------------------------- DECLARE @DateStart DATE = '01-JAN-17'; DECLARE @DateEnd DATE = GetDate(); WITH number_tbl AS ( SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 ), number_sequences AS ( SELECT (t3.a + (10 * t2.a) + (100 * t1.a)) AS concatenate_calculation FROM number_tbl t1 CROSS JOIN number_tbl t2 CROSS JOIN number_tbl t3 ), date_sequence AS ( SELECT CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) AS [TheDate] FROM number_sequences WHERE CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) <= @DateEnd ) SELECT [TheDate] INTO #DateList FROM date_sequence ORDER BY [TheDate] ; CREATE UNIQUE CLUSTERED INDEX PK_DateList ON #DateList ([TheDate]) ; -------------------------------------------------------------------------------- -- Create Dummy Fact table (we will keep coming back to this) -- Using cartesian join, giving 4 rows for each date -------------------------------------------------------------------------------- WITH s_data AS ( SELECT 'Data 1' AS [c_attribute], 100 AS [c_metric] UNION ALL SELECT 'Data 2' AS [c_attribute], 200 AS [c_metric] UNION ALL SELECT 'Data 3' AS [c_attribute], 300 AS [c_metric] UNION ALL SELECT 'Data 4' AS [c_attribute], 400 AS [c_metric] ) SELECT [TheDate], [c_attribute], [c_metric] INTO #FactTable FROM #DateList, s_data ; CREATE CLUSTERED INDEX SK_DateList ON #FactTable ([TheDate]) ; -------------------------------------------------------------------------------- -- Data Inserts -------------------------------------------------------------------------------- INSERT INTO [dbo].[FactTable_1] SELECT * FROM #FactTable ; INSERT INTO [dbo].[FactTable_2] SELECT * FROM #FactTable ;
View Partitions & Usage
This is a handy SQL statement to view lots of useful information about partitions:
-------------------------------------------------------------------------------- -- View Partitions and usage -------------------------------------------------------------------------------- SELECT OBJECT_NAME(p.[object_id]) AS [Table_Name], --p.[object_id], --p.[partition_id], p.[partition_number], p.[rows] AS [Partition_rows], p.[data_compression_desc] AS [Partition_data_compression], --i.[index_id], -- Linked by [object_id] i.[name] AS [Index_Name], --i.[type], i.[type_desc] AS [Index_Type], --i.[is_unique], --i.[is_primary_key], --i.[is_unique_constraint], --i.[data_space_id], -- Link to scheme s.[name] AS [Scheme_Name], --f.[function_id], f.[name] AS [Function_Name], --f.[create_date] AS [Function_create_date], --f.[modify_date] AS [Function_modify_date], --rv.[boundary_id], -- Link to partition_id lv.[value] AS [Lower_Value], rv.[value] AS [Upper_Value] FROM sys.partitions p INNER JOIN sys.objects o ON o.[object_id] = p.[object_id] --INNER JOIN sys.allocation_units a -- ON p.[hobt_id] = a.[container_id] LEFT JOIN sys.indexes i ON i.[object_id] = p.[object_id] AND i.[index_id] = p.[index_id] LEFT JOIN sys.partition_schemes s ON s.[data_space_id] = i.[data_space_id] LEFT JOIN sys.partition_functions f ON f.[function_id] = s.[function_id] LEFT JOIN sys.partition_range_values rv ON rv.[function_id] = f.[function_id] AND rv.[boundary_id] = p.[partition_number] LEFT JOIN sys.partition_range_values lv ON lv.[function_id] = f.[function_id] AND lv.[boundary_id] = p.[partition_number] - 1 WHERE 1=1 AND( p.[object_id] = OBJECT_ID('dbo.FactTable_1') -- Filter for an object OR p.[object_id] = OBJECT_ID('dbo.FactTable_2') ) --AND o.[name] LIKE 'FactTable%' -- Alternative filter ORDER BY o.[name], i.[name], p.[partition_number] ;
2 Replies to “Partitions in SQL Server: Creating a Partitioned Fact Table (by date)”