Partitions in SQL Server: Creating a Partitioned Table

Partitions in SQL Server: Creating a Partitioned Table

This is going to be the first in a series of posts about partitions. I’ve found that there are a number of common partitioning scenarios in data warehousing, and when I started writing about them I realise there was a lot to cover.

Partitions are now a common feature in most DBMS environments, although the syntax and functionality varies between implementations. Here I’m looking at the fundamentals of partitioned tables in SQL Server.

Partitions allow database administrators to specify how data for a table is stored/grouped on the disk. Data is logically split by the value of a column (or combination of columns). This splitting of the data gives a number of potential performance/maintenance benefits:

  • Individual partitions can be truncated/dropped (leaving the rest of the data untouched).
  • Some reads can be at the partition level.
  • Some locks can be at the partition level.

A common partition scenario (at least in data warehousing) is to partition data by an activity date. This data can be partitioned at any granularity that the data type would allow in a query.

What makes a partitioned table in SQL Server?

In SQL Server, to partition a table you first need to define a function, and then a scheme.

Partition Function: The definition of how data is to be split. It includes the data type and the value ranges to use in each partition.

Partition Scheme: The definition of how a function is to be applied to data files. This allows DBAs to split data across logical storage locations if required, however in most modern environments with large SANs most SQL Server implementations and their DBAs will just use ‘primary’.

A partition function can be used in one or more schemes, and a scheme in one or more tables. There can be organisational advantages to sharing a scheme/function across tables (update one, and you update everything in kind). However, in my experience most cases DBAs prefer to have one function and scheme combination for each table.

Partition Function

The function specifies the data type to use (in most of my cases, it’s a date), and a range of values to apply (to that data type) for partitions. The range is defined as being either LEFT or RIGHT. Generally, most people use RIGHT.

See the “Example Partitioned Table” section for an example.

Left vs Right. This isn’t a foray into politics or a recap on they syntax for a join. This keyword defines if the data goes to the left (less than or equal to) or the right (greater than or equal to) the value specified.

If we partition a table on year, specifying the years 2016, 2017 and 2018. What happens to the record for ’01-JUN-2017′?

  • RANGE RIGHT: It is placed into the partition starting 2017.
  • RANGE LEFT: It is placed into the partition ending 2018.

I’ll demonstrate this in a future article.

Partition Scheme

The scheme defines where (which filegroup) to store the data for individual partitions. For the sake of this article I’m keeping everything on primary.

See the “Example Partitioned Table” section for an example.

Partitioned Table

Lastly a table needs to be defined (as normal), with two additional requirements:

  1. The storage location is given as the partition scheme (with the name of the column to be used for partitioning).
  2. The table must have a clustered index (usually the primary key) which includes the column to be used for partitioning.

See the “Example Partitioned Table” section for an example.

Example Partitioned Table

Sample Function, Scheme and Table

Note: although we specify 3 partitions for the table, because this is a RIGHT table, an extra partition will automatically be added to the start of the table for all data values less than the lowest specified partition value.

--------------------------------------------------------------------------------
-- Partition Function
-- Using RIGHT (so the value given is the LOWER value)
--------------------------------------------------------------------------------
CREATE PARTITION FUNCTION [FactTable_PF]
      (DATE) -- Datatype to use
AS
RANGE RIGHT  -- LEFT or RIGHT see above
 FOR VALUES  -- Values on which to 'split' the data
(
    N'2016-01-01T00:00:00', -- This partition will contain 2 years data (2016 and 2017)
    N'2018-01-01T00:00:00', -- This partition will only contain 1 year of data (2018)
    N'2019-01-01T00:00:00'  -- This partition will contain all data from 2019 onwards (unless a subsequent partition is added in the future)
);

--------------------------------------------------------------------------------
-- Partition Scheme (using the above function)
--------------------------------------------------------------------------------
CREATE PARTITION SCHEME [FactTable_PS]
    AS PARTITION [FactTable_PF] -- This is where we specify the partition function
ALL TO([primary])
;

--------------------------------------------------------------------------------
-- Create Table (using the above scheme)
--------------------------------------------------------------------------------
CREATE TABLE [dbo].[FactTable_partitioned]
(
    [TheDate]     DATE,
    [c_attribute] VARCHAR(100),
    [c_metric]    BIGINT,
    CONSTRAINT [PK_FactTable_partitioned] PRIMARY KEY ([TheDate], [c_attribute])
)
ON [FactTable_PS]  -- This is where we specify the partition scheme
  ([TheDate])      -- This is where we specify the column to apply the function within the scheme to
;

Sample Data

--------------------------------------------------------------------------------
-- Create Date List
--------------------------------------------------------------------------------
DECLARE @DateStart DATE = '01-JAN-17';
DECLARE @DateEnd   DATE = '01-JAN-19';

WITH number_tbl AS (
SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL
SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL
SELECT 8 UNION ALL SELECT 9
), number_sequences AS (
SELECT (t3.a + (10 * t2.a) + (100 * t1.a)) AS concatenate_calculation
  FROM number_tbl t1
  CROSS JOIN number_tbl t2
  CROSS JOIN number_tbl t3
), date_sequence AS (
SELECT CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) AS [TheDate]
  FROM number_sequences
 WHERE CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) <= @DateEnd
)
SELECT [TheDate]
  INTO #DateList
  FROM date_sequence
 ORDER BY [TheDate]
;

CREATE UNIQUE CLUSTERED
 INDEX PK_DateList
    ON #DateList
      ([TheDate])
;

--------------------------------------------------------------------------------
-- Create Dummy Fact table (we will keep coming back to this)
-- Using cartesian join, giving 4 rows for each date
--------------------------------------------------------------------------------
WITH s_data AS (
SELECT 'Data 1' AS [c_attribute], 100 AS [c_metric] UNION ALL
SELECT 'Data 2' AS [c_attribute], 200 AS [c_metric] UNION ALL
SELECT 'Data 3' AS [c_attribute], 300 AS [c_metric] UNION ALL
SELECT 'Data 4' AS [c_attribute], 400 AS [c_metric]
)
SELECT [TheDate],
       [c_attribute],
       [c_metric]
  INTO #FactTable
  FROM #DateList,
       s_data
;

CREATE CLUSTERED
 INDEX SK_DateList
    ON #FactTable
      ([TheDate])
;

--------------------------------------------------------------------------------
-- Insert to partitioned table
--------------------------------------------------------------------------------
INSERT
  INTO [dbo].[FactTable_partitioned]
SELECT *
  FROM #FactTable
;

Useful Views & Notes

View Detailed Partition Usage

--------------------------------------------------------------------------------
-- Detailed View of Partitions and their usage
--------------------------------------------------------------------------------
SELECT OBJECT_NAME(p.[object_id]) AS [Table_Name],
     --p.[object_id],
     --p.[partition_id],
       p.[partition_number],
       p.[rows]                   AS [Partition_rows],
       p.[data_compression_desc]  AS [Partition_data_compression],
     --i.[index_id],         -- Linked by [object_id]
       i.[name]                   AS [Index_Name],
     --i.[type],
       i.[type_desc]              AS [Index_Type],
     --i.[is_unique],
     --i.[is_primary_key],
     --i.[is_unique_constraint],
     --i.[data_space_id],    -- Link to scheme
       s.[name]                   AS [Scheme_Name],
     --f.[function_id],
       f.[name]                   AS [Function_Name],
     --f.[create_date]            AS [Function_create_date],
     --f.[modify_date]            AS [Function_modify_date],
     --rv.[boundary_id],     -- Link to partition_id
       lv.[value]                 AS [Lower_Value],
       rv.[value]                 AS [Upper_Value]
  FROM sys.partitions p
    INNER JOIN sys.objects o
       ON o.[object_id] = p.[object_id]
  --INNER JOIN sys.allocation_units a
  --   ON p.[hobt_id] = a.[container_id]
     LEFT JOIN sys.indexes i
       ON i.[object_id] = p.[object_id]
      AND i.[index_id]  = p.[index_id]
     LEFT JOIN sys.partition_schemes s
       ON s.[data_space_id] = i.[data_space_id]
     LEFT JOIN sys.partition_functions f
       ON f.[function_id] = s.[function_id]
     LEFT JOIN sys.partition_range_values rv
       ON rv.[function_id] = f.[function_id]
      AND rv.[boundary_id] = p.[partition_number]
     LEFT JOIN sys.partition_range_values lv
       ON lv.[function_id] = f.[function_id]
      AND lv.[boundary_id] = p.[partition_number] - 1
 WHERE 1=1
 --AND p.[object_id] = OBJECT_ID('FactTable') -- Filter for one object
   AND o.[name] LIKE 'FactTable%'
 ORDER BY o.[name],
       i.[name],
       p.[partition_number]
;

View Partitions

--------------------------------------------------------------------------------
-- View Partitions
--------------------------------------------------------------------------------
-- Partitions by table
SELECT o.[name]           AS [objectname],
       i.[name]           AS [indexname],
       p.[partition_id],
       p.[partition_number],
       p.[rows]
  FROM sys.partitions p
    INNER JOIN sys.objects o ON o.object_id=p.object_id
    INNER JOIN sys.indexes i ON i.object_id=p.object_id AND p.index_id=i.index_id
 WHERE o.[name] LIKE '%FactTable%' -- Filter for our table name(s)
   AND o.[type] = N'U'             -- User defined table
 ORDER BY o.[name],
       [partition_number]
;

View Function Configuration

--------------------------------------------------------------------------------
-- Partition Function Configuration(s)
--------------------------------------------------------------------------------
SELECT * --CONVERT(DATETIME,Value)
  FROM sys.partition_functions f
    INNER JOIN sys.partition_range_values r   
       ON f.function_id = r.function_id 
 WHERE f.name LIKE '%FactTable%' -- Filter for our partition name(s)
;

 

2 Replies to “Partitions in SQL Server: Creating a Partitioned Table”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.