Partitions in SQL Server: Creating a Partitioned Fact Table (by date)

Partitions in SQL Server: Creating a Partitioned Fact Table (by date)

I want to demonstrate a simple, yet common style partitioned fact table. I’ll also include here some handy bits of SQL to help along the way.

I started looking at partitions, and the syntax behind the commands in some previous posts. If you are new to partitioning in SQL Server,it’s worth checking them out. If you just want to create a new partitioned fact table, read ahead.

Outline

The Fact Table

For this article, I’ll have a very generic and (for the purpose of demonstration) small fact table. It has a date (on which we want to partition the data) and some extra attributes and metrics. It represents a simple key value pair for attribute values for each day:

CREATE TABLE [dbo].[FactTable]
(
    [Date]      DATE,
    [Attribute] VARCHAR(100),
    [Metric]    BIGINT
);

Date is used here to partition as it has the advantages of:

  1. Simplifying insertions (new data for each period is added to a new partition).
  2. Deletions (in the event of reprocessing) can be done easily/quickly. Especially post 2016, where a partition can be truncated.
  3. When querying fact tables, metrics are frequently rolled up for date periods, or only specific date ranges are required.

Components of a Partitioned Table

  1. Partition Function: The definition of how data is to be split. It includes the data type, and the value ranges to use in each partition.
  2. Partition Scheme: The definition of how a partition function is to be applied to data files. This allows DBAs to split data across logical storage locations if required, however in most modern environments with large SANs most SQL Server implementations and their DBAs will just use ‘primary’.
  3. Table & Index Definition: The last part is the definition of the table, including the partition scheme to use, and the index with which to apply it.

Creating The Partitioned Table

Partition Function

For this dataset, I’m going to add a new partition for each month (functions can be for any time frame you chose):

--------------------------------------------------------------------------------
-- Partition Function
-- Using RIGHT (so the value given is the LOWER value)
--------------------------------------------------------------------------------
CREATE PARTITION FUNCTION [FactTable_PF]
      (DATE) -- Datatype to use
AS
RANGE RIGHT  -- See previous post
 FOR VALUES  -- Values on which to 'split' the data
(
    N'2018-01-01T00:00:00', -- This will contain all data for 2018 Jan onwards (until the next partition). All data before then goes into a 'catch all' partition
    N'2018-02-01T00:00:00', -- This partition will contain all data from Feb 2018 onwards (unless a subsequent partition is added in the future)
);

Partition Scheme

Next, we create the scheme to tell SQL Server how to organise the data on the disk(s):

--------------------------------------------------------------------------------
-- Partition Scheme
--------------------------------------------------------------------------------
CREATE PARTITION SCHEME [FactTable_PS]
    AS PARTITION [FactTable_PF]
ALL TO ([primary])
;

Table & Index

A table should always (OK, so there might be exceptions) have a clustered index. Either as an explicit primary key, or just a plain clustered index. A fact table is no different. The clustered index is where the performance can be lifted with partitions.

Below I’m creating two variations on the fact table, one with a classic Primary Key, the other using a Clustered Index. Both will have the same result of partitioning the data using the scheme defined above.

--------------------------------------------------------------------------------
-- Fact Table
-- Using PK
--------------------------------------------------------------------------------
CREATE TABLE [dbo].[FactTable_1]
(
    [Date]      DATE,
    [Attribute] VARCHAR(100),
    [Metric]    BIGINT,
    CONSTRAINT [PK_FactTable] PRIMARY KEY ([Date], [Attribute]) -- optional PK Constraint
)
ON [FactTable_PS] ([Date])
WITH -- Optional compression, worth it on large tables
( DATA_COMPRESSION = PAGE
)
;

--------------------------------------------------------------------------------
-- Fact Table
-- Using CLUSTERED INDEX
--------------------------------------------------------------------------------
CREATE TABLE [dbo].[FactTable_2]
(
    [Date]      DATE,
    [Attribute] VARCHAR(100),
    [Metric]    BIGINT
)
ON [FactTable_PS] ([Date])
WITH -- Optional compression, worth it on large tables
( DATA_COMPRESSION = PAGE
)
;

CREATE CLUSTERED
 INDEX [CIDX_FactTable_2]
    ON [dbo].[FactTable_2]
      ([Date])
    ON [FactTable_PS] ([Date])
;

Adding More Partitions

So, from the definition of the function (and therefore scheme) above, we can see that only two partitions have been declared. As we are using the same time frame for each partition (not a requirement, and there are arguments for having different durations for different partitions), We can programmatically add new partitions:

--------------------------------------------------------------------------------
-- Add New Partitions (for date range)
-- From last date in table, up to latest
--------------------------------------------------------------------------------
DECLARE @CurrentDate DATE= GetDate();
DECLARE @ExistingMax DATE=(SELECT MAX(CONVERT(DATE,Value))
                             FROM sys.partition_functions f
                               INNER JOIN sys.partition_range_values r   
                                  ON f.function_id = r.function_id 
                              WHERE f.name = 'FactTable_PF' -- Function Name
                          );

PRINT '          Date Now: ' + CONVERT(VARCHAR(10), @CurrentDate, 120);
PRINT 'Max Partition Date: ' + CONVERT(VARCHAR(10), @ExistingMax, 120);


IF @ExistingMax < @CurrentDate
BEGIN
    DECLARE @lastDate       DATE;
    DECLARE @processingDate DATE = @ExistingMax;

    WHILE @processingDate <= @CurrentDate
    BEGIN
        SET @lastDate       = @processingDate;
        SET @processingDate = DateAdd(MONTH, 1, DateAdd(month, DateDiff(month, 0, @processingDate), 0)) -- Note: See date processing

        PRINT 'Adding partition for: ' + CONVERT(VARCHAR(10), @processingDate, 120);

        BEGIN TRANSACTION;

            ALTER PARTITION SCHEME FactTable_PS
            NEXT USED [PRIMARY]

            ALTER PARTITION FUNCTION FactTable_PF()
            SPLIT RANGE (@processingDate)

        COMMIT TRANSACTION;
    END;

END;

Testing Our Fact Table

To test our fact table(s), we first generate some dummy data, insert it to te tables, then view the metadata for the partitions.

Dummy Data

--------------------------------------------------------------------------------
-- Create Date List
--------------------------------------------------------------------------------
DECLARE @DateStart DATE = '01-JAN-17';
DECLARE @DateEnd   DATE = GetDate();

WITH number_tbl AS (
SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL
SELECT 8 UNION ALL SELECT 9
), number_sequences AS (
SELECT (t3.a + (10 * t2.a) + (100 * t1.a)) AS concatenate_calculation
  FROM number_tbl t1
  CROSS JOIN number_tbl t2
  CROSS JOIN number_tbl t3
), date_sequence AS (
SELECT CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) AS [TheDate]
  FROM number_sequences
 WHERE CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) <= @DateEnd
)
SELECT [TheDate]
  INTO #DateList
  FROM date_sequence
 ORDER BY [TheDate]
;

CREATE UNIQUE CLUSTERED
 INDEX PK_DateList
    ON #DateList
      ([TheDate])
;

--------------------------------------------------------------------------------
-- Create Dummy Fact table (we will keep coming back to this)
-- Using cartesian join, giving 4 rows for each date
--------------------------------------------------------------------------------
WITH s_data AS (
SELECT 'Data 1' AS [c_attribute], 100 AS [c_metric] UNION ALL
SELECT 'Data 2' AS [c_attribute], 200 AS [c_metric] UNION ALL
SELECT 'Data 3' AS [c_attribute], 300 AS [c_metric] UNION ALL
SELECT 'Data 4' AS [c_attribute], 400 AS [c_metric]
)
SELECT [TheDate],
       [c_attribute],
       [c_metric]
  INTO #FactTable
  FROM #DateList,
       s_data
;

CREATE CLUSTERED
 INDEX SK_DateList
    ON #FactTable
      ([TheDate])
;

--------------------------------------------------------------------------------
-- Data Inserts
--------------------------------------------------------------------------------
INSERT
  INTO [dbo].[FactTable_1]
SELECT *
  FROM #FactTable
;

INSERT
  INTO [dbo].[FactTable_2]
SELECT *
  FROM #FactTable
;

View Partitions & Usage

This is a handy SQL statement to view lots of useful information about partitions:

--------------------------------------------------------------------------------
-- View Partitions and usage
--------------------------------------------------------------------------------
SELECT OBJECT_NAME(p.[object_id]) AS [Table_Name],
     --p.[object_id],
     --p.[partition_id],
       p.[partition_number],
       p.[rows]                   AS [Partition_rows],
       p.[data_compression_desc]  AS [Partition_data_compression],
     --i.[index_id],         -- Linked by [object_id]
       i.[name]                   AS [Index_Name],
     --i.[type],
       i.[type_desc]              AS [Index_Type],
     --i.[is_unique],
     --i.[is_primary_key],
     --i.[is_unique_constraint],
     --i.[data_space_id],    -- Link to scheme
       s.[name]                   AS [Scheme_Name],
     --f.[function_id],
       f.[name]                   AS [Function_Name],
     --f.[create_date]            AS [Function_create_date],
     --f.[modify_date]            AS [Function_modify_date],
     --rv.[boundary_id],     -- Link to partition_id
       lv.[value]                 AS [Lower_Value],
       rv.[value]                 AS [Upper_Value]
  FROM sys.partitions p
    INNER JOIN sys.objects o
       ON o.[object_id] = p.[object_id]
  --INNER JOIN sys.allocation_units a
  --   ON p.[hobt_id] = a.[container_id]
     LEFT JOIN sys.indexes i
       ON i.[object_id] = p.[object_id]
      AND i.[index_id]  = p.[index_id]
     LEFT JOIN sys.partition_schemes s
       ON s.[data_space_id] = i.[data_space_id]
     LEFT JOIN sys.partition_functions f
       ON f.[function_id] = s.[function_id]
     LEFT JOIN sys.partition_range_values rv
       ON rv.[function_id] = f.[function_id]
      AND rv.[boundary_id] = p.[partition_number]
     LEFT JOIN sys.partition_range_values lv
       ON lv.[function_id] = f.[function_id]
      AND lv.[boundary_id] = p.[partition_number] - 1
 WHERE 1=1
   AND(   p.[object_id] = OBJECT_ID('dbo.FactTable_1') -- Filter for an object
       OR p.[object_id] = OBJECT_ID('dbo.FactTable_2')
      )
 --AND o.[name] LIKE 'FactTable%' -- Alternative filter
 ORDER BY o.[name],
       i.[name],
       p.[partition_number]
;

 

2 Replies to “Partitions in SQL Server: Creating a Partitioned Fact Table (by date)”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.