Managing update differences – NULL and NOT NULL
When are two values the same but different? When they are both NULL. Here I compare equality checks, including my datatype neutral method.
The Problem
Equality (or lack thereof) between values is an easy check. Is 1 = 3
, or is 4 <> 5
? Many people (and processes) get unexpected results when they try to compare NULL
with NULL
.
If we consider the below dataset. It contains rows without NULLs, and rows with one or two NULLs:
-------------------------------------------------------------------------------- -- Create Dataset -------------------------------------------------------------------------------- CREATE TABLE [value_list] ( [ID] INT NOT NULL PRIMARY KEY, [val_1] INT NULL, [val_2] INT NULL ); INSERT INTO [value_list] ([ID], [val_1], [val_2]) VALUES( 1, 1, 1), ( 2, 2, 2), ( 3, 3, NULL), ( 4, NULL, NULL), ( 5, NULL, 5), ( 6, 6, 7) ;
ID | val_1 | val_2 |
---|---|---|
1 | 1 | 1 |
2 | 2 | 2 |
3 | 3 | NULL |
4 | NULL | NULL |
5 | NULL | 5 |
6 | 6 | 7 |
Equality (or not)
Rows 1 & 2 will be returned for a test of equality:
SELECT [ID], [val_1], [val_2] FROM [value_list] WHERE [val_1] = [val_2] ; /* ID , val_1 , val_2 1 , 1 , 1 2 , 2 , 2 */
Similarly row 6 will be returned for a test of inequality:
SELECT [ID], [val_1], [val_2] FROM [value_list] WHERE [val_1] <> [val_2] ; /* ID , val_1 , val_2 6 , 6 , 7 */
But what about the NULLs?
NULL always has been, and always will be special. We need it for many good reasons. But that doesn’t stop it being a potential oversight when we consider the two scenarios:
- For the equality check, what if we want to include the NULL values? Afterall, if both val_1 and val_2 are NULL, aren’t they the same?
- And for the difference check, isn’t NULL a different value to 3 (as in row 3) or 5 (as in row 5)?
The solutions
Removing NULLs from the data
Simply make the columns NOT NULL, and this problem won’t happen. If you don’t have a value, then stick something in there for UNKNOWN (-1 for integer references, an empty string etc…).
While this can be done, it’s not best practice, and can have it’s own issues if the predefined value suddenly has a meaning in the data.
COALESCE to an arbitrary value
This is the easiest way to perform the check, modify both sides of the check to have a default (arbitrary) value to replace NULL with. When there is a NULL, it will be substituted:
SELECT [ID], [val_1], [val_2] FROM [value_list] WHERE COALESCE([val_1], -1) <> COALESCE([val_2], -1) ; /* ID , val_1 , val_2 3 , 3 , NULL 5 , NULL , 5 6 , 6 , 7 */
This gets round the issue of NULL comparissons, but it has two drawbacks:
- One has to think of a value which won’t appear in the data (with numbers a negative can be used, but not always).
- The value has to change depending on the datatype.
Multiple Checks
SELECT [ID], [val_1], [val_2] FROM [value_list] WHERE( [val_1] <> [val_2] OR([val_1] IS NULL AND [val_2] IS NOT NULL) OR([val_1] IS NOT NULL AND [val_2] IS NULL) ) ; /* ID , val_1 , val_2 3 , 3 , NULL 5 , NULL , 5 6 , 6 , 7 */
The above is my preferred method, it’s longer in code than using COALESCE but it’s the same code for all datatypes and explicit in all cases. It also avoids the challenge of thinking of a value that is (in theory) impossible to appear in the actual data.
See working solution on rextester.com
The Dream Solution
My dream (which doesn’t exist) is a modifier for statements, which would allow NULL to behave like a distinct value:
-- Different values SELECT * FROM t_1 WHERE c_1 <x> c_2 ; -- Equality SELECT * FROM t_1 WHERE c_1 x= c_2 ;
2 Replies to “Managing update differences – NULL and NOT NULL”