Partitions in SQL Server: Creating a Partitioned Table
This is going to be the first in a series of posts about partitions. I’ve found that there are a number of common partitioning scenarios in data warehousing, and when I started writing about them I realise there was a lot to cover.
Partitions are now a common feature in most DBMS environments, although the syntax and functionality varies between implementations. Here I’m looking at the fundamentals of partitioned tables in SQL Server.
Partitions allow database administrators to specify how data for a table is stored/grouped on the disk. Data is logically split by the value of a column (or combination of columns). This splitting of the data gives a number of potential performance/maintenance benefits:
- Individual partitions can be truncated/dropped (leaving the rest of the data untouched).
- Some reads can be at the partition level.
- Some locks can be at the partition level.
A common partition scenario (at least in data warehousing) is to partition data by an activity date. This data can be partitioned at any granularity that the data type would allow in a query.
What makes a partitioned table in SQL Server?
In SQL Server, to partition a table you first need to define a function, and then a scheme.
Partition Function: The definition of how data is to be split. It includes the data type and the value ranges to use in each partition.
Partition Scheme: The definition of how a function is to be applied to data files. This allows DBAs to split data across logical storage locations if required, however in most modern environments with large SANs most SQL Server implementations and their DBAs will just use ‘primary’.
A partition function can be used in one or more schemes, and a scheme in one or more tables. There can be organisational advantages to sharing a scheme/function across tables (update one, and you update everything in kind). However, in my experience most cases DBAs prefer to have one function and scheme combination for each table.
Partition Function
The function specifies the data type to use (in most of my cases, it’s a date), and a range of values to apply (to that data type) for partitions. The range is defined as being either LEFT
or RIGHT
. Generally, most people use RIGHT
.
See the “Example Partitioned Table” section for an example.
Left vs Right. This isn’t a foray into politics or a recap on they syntax for a join. This keyword defines if the data goes to the left (less than or equal to) or the right (greater than or equal to) the value specified.
If we partition a table on year, specifying the years 2016, 2017 and 2018. What happens to the record for ’01-JUN-2017′?
- RANGE RIGHT: It is placed into the partition starting 2017.
- RANGE LEFT: It is placed into the partition ending 2018.
I’ll demonstrate this in a future article.
Partition Scheme
The scheme defines where (which filegroup) to store the data for individual partitions. For the sake of this article I’m keeping everything on primary.
See the “Example Partitioned Table” section for an example.
Partitioned Table
Lastly a table needs to be defined (as normal), with two additional requirements:
- The storage location is given as the partition scheme (with the name of the column to be used for partitioning).
- The table must have a clustered index (usually the primary key) which includes the column to be used for partitioning.
See the “Example Partitioned Table” section for an example.
Example Partitioned Table
Sample Function, Scheme and Table
Note: although we specify 3 partitions for the table, because this is a RIGHT table, an extra partition will automatically be added to the start of the table for all data values less than the lowest specified partition value.
-------------------------------------------------------------------------------- -- Partition Function -- Using RIGHT (so the value given is the LOWER value) -------------------------------------------------------------------------------- CREATE PARTITION FUNCTION [FactTable_PF] (DATE) -- Datatype to use AS RANGE RIGHT -- LEFT or RIGHT see above FOR VALUES -- Values on which to 'split' the data ( N'2016-01-01T00:00:00', -- This partition will contain 2 years data (2016 and 2017) N'2018-01-01T00:00:00', -- This partition will only contain 1 year of data (2018) N'2019-01-01T00:00:00' -- This partition will contain all data from 2019 onwards (unless a subsequent partition is added in the future) ); -------------------------------------------------------------------------------- -- Partition Scheme (using the above function) -------------------------------------------------------------------------------- CREATE PARTITION SCHEME [FactTable_PS] AS PARTITION [FactTable_PF] -- This is where we specify the partition function ALL TO([primary]) ; -------------------------------------------------------------------------------- -- Create Table (using the above scheme) -------------------------------------------------------------------------------- CREATE TABLE [dbo].[FactTable_partitioned] ( [TheDate] DATE, [c_attribute] VARCHAR(100), [c_metric] BIGINT, CONSTRAINT [PK_FactTable_partitioned] PRIMARY KEY ([TheDate], [c_attribute]) ) ON [FactTable_PS] -- This is where we specify the partition scheme ([TheDate]) -- This is where we specify the column to apply the function within the scheme to ;
Sample Data
-------------------------------------------------------------------------------- -- Create Date List -------------------------------------------------------------------------------- DECLARE @DateStart DATE = '01-JAN-17'; DECLARE @DateEnd DATE = '01-JAN-19'; WITH number_tbl AS ( SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 ), number_sequences AS ( SELECT (t3.a + (10 * t2.a) + (100 * t1.a)) AS concatenate_calculation FROM number_tbl t1 CROSS JOIN number_tbl t2 CROSS JOIN number_tbl t3 ), date_sequence AS ( SELECT CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) AS [TheDate] FROM number_sequences WHERE CAST(DateAdd(DAY, concatenate_calculation, @DateStart) AS DATE) <= @DateEnd ) SELECT [TheDate] INTO #DateList FROM date_sequence ORDER BY [TheDate] ; CREATE UNIQUE CLUSTERED INDEX PK_DateList ON #DateList ([TheDate]) ; -------------------------------------------------------------------------------- -- Create Dummy Fact table (we will keep coming back to this) -- Using cartesian join, giving 4 rows for each date -------------------------------------------------------------------------------- WITH s_data AS ( SELECT 'Data 1' AS [c_attribute], 100 AS [c_metric] UNION ALL SELECT 'Data 2' AS [c_attribute], 200 AS [c_metric] UNION ALL SELECT 'Data 3' AS [c_attribute], 300 AS [c_metric] UNION ALL SELECT 'Data 4' AS [c_attribute], 400 AS [c_metric] ) SELECT [TheDate], [c_attribute], [c_metric] INTO #FactTable FROM #DateList, s_data ; CREATE CLUSTERED INDEX SK_DateList ON #FactTable ([TheDate]) ; -------------------------------------------------------------------------------- -- Insert to partitioned table -------------------------------------------------------------------------------- INSERT INTO [dbo].[FactTable_partitioned] SELECT * FROM #FactTable ;
Useful Views & Notes
View Detailed Partition Usage
-------------------------------------------------------------------------------- -- Detailed View of Partitions and their usage -------------------------------------------------------------------------------- SELECT OBJECT_NAME(p.[object_id]) AS [Table_Name], --p.[object_id], --p.[partition_id], p.[partition_number], p.[rows] AS [Partition_rows], p.[data_compression_desc] AS [Partition_data_compression], --i.[index_id], -- Linked by [object_id] i.[name] AS [Index_Name], --i.[type], i.[type_desc] AS [Index_Type], --i.[is_unique], --i.[is_primary_key], --i.[is_unique_constraint], --i.[data_space_id], -- Link to scheme s.[name] AS [Scheme_Name], --f.[function_id], f.[name] AS [Function_Name], --f.[create_date] AS [Function_create_date], --f.[modify_date] AS [Function_modify_date], --rv.[boundary_id], -- Link to partition_id lv.[value] AS [Lower_Value], rv.[value] AS [Upper_Value] FROM sys.partitions p INNER JOIN sys.objects o ON o.[object_id] = p.[object_id] --INNER JOIN sys.allocation_units a -- ON p.[hobt_id] = a.[container_id] LEFT JOIN sys.indexes i ON i.[object_id] = p.[object_id] AND i.[index_id] = p.[index_id] LEFT JOIN sys.partition_schemes s ON s.[data_space_id] = i.[data_space_id] LEFT JOIN sys.partition_functions f ON f.[function_id] = s.[function_id] LEFT JOIN sys.partition_range_values rv ON rv.[function_id] = f.[function_id] AND rv.[boundary_id] = p.[partition_number] LEFT JOIN sys.partition_range_values lv ON lv.[function_id] = f.[function_id] AND lv.[boundary_id] = p.[partition_number] - 1 WHERE 1=1 --AND p.[object_id] = OBJECT_ID('FactTable') -- Filter for one object AND o.[name] LIKE 'FactTable%' ORDER BY o.[name], i.[name], p.[partition_number] ;
View Partitions
-------------------------------------------------------------------------------- -- View Partitions -------------------------------------------------------------------------------- -- Partitions by table SELECT o.[name] AS [objectname], i.[name] AS [indexname], p.[partition_id], p.[partition_number], p.[rows] FROM sys.partitions p INNER JOIN sys.objects o ON o.object_id=p.object_id INNER JOIN sys.indexes i ON i.object_id=p.object_id AND p.index_id=i.index_id WHERE o.[name] LIKE '%FactTable%' -- Filter for our table name(s) AND o.[type] = N'U' -- User defined table ORDER BY o.[name], [partition_number] ;
View Function Configuration
-------------------------------------------------------------------------------- -- Partition Function Configuration(s) -------------------------------------------------------------------------------- SELECT * --CONVERT(DATETIME,Value) FROM sys.partition_functions f INNER JOIN sys.partition_range_values r ON f.function_id = r.function_id WHERE f.name LIKE '%FactTable%' -- Filter for our partition name(s) ;
2 Replies to “Partitions in SQL Server: Creating a Partitioned Table”