Managing update differences – NULL and NOT NULL

2018-08-11

When are two values the same but different? When they are both NULL. Here I compare equality checks, including my datatype neutral method.

The Problem

Equality (or lack thereof) between values is an easy check. Is 1 = 3, or is 4 <> 5? Many people (and processes) get unexpected results when they try to compare NULL with NULL.

If we consider the below dataset. It contains rows without NULLs, and rows with one or two NULLs:

--------------------------------------------------------------------------------
-- Create Dataset
--------------------------------------------------------------------------------
CREATE TABLE [value_list]
(
    [ID]     INT NOT NULL PRIMARY KEY,
    [val_1]  INT NULL,
    [val_2]  INT NULL
);

INSERT
  INTO [value_list]
      ([ID], [val_1], [val_2])
VALUES(   1,       1,       1),
      (   2,       2,       2),
      (   3,       3,    NULL),
      (   4,    NULL,    NULL),
      (   5,    NULL,       5),
      (   6,       6,       7)
;

ID	val_1	val_2
1	1	1
2	2	2
3	3	NULL
4	NULL	NULL
5	NULL	5
6	6	7

Equality (or not)

Rows 1 & 2 will be returned for a test of equality:

SELECT [ID],
       [val_1],
       [val_2]
  FROM [value_list]
 WHERE [val_1] = [val_2]
;

/*
ID , val_1 , val_2
1 , 1 , 1
2 , 2 , 2
*/

Similarly row 6 will be returned for a test of inequality:

SELECT [ID],
       [val_1],
       [val_2]
  FROM [value_list]
 WHERE [val_1] <> [val_2]
;

/*
ID  , val_1 , val_2
6   , 6     , 7
*/

But what about the NULLs?

NULL always has been, and always will be special. We need it for many good reasons. But that doesn’t stop it being a potential oversight when we consider the two scenarios:

For the equality check, what if we want to include the NULL values? Afterall, if both val_1 and val_2 are NULL, aren’t they the same?
And for the difference check, isn’t NULL a different value to 3 (as in row 3) or 5 (as in row 5)?

The solutions

Removing NULLs from the data

Simply make the columns NOT NULL, and this problem won’t happen. If you don’t have a value, then stick something in there for UNKNOWN (-1 for integer references, an empty string etc…).

While this can be done, it’s not best practice, and can have it’s own issues if the predefined value suddenly has a meaning in the data.

COALESCE to an arbitrary value

This is the easiest way to perform the check, modify both sides of the check to have a default (arbitrary) value to replace NULL with. When there is a NULL, it will be substituted:

SELECT [ID],
       [val_1],
       [val_2]
  FROM [value_list]
 WHERE COALESCE([val_1], -1) <> COALESCE([val_2], -1)
;

/*
ID  , val_1 , val_2
3   , 3     , NULL
5   , NULL  , 5
6   , 6     , 7
*/

This gets round the issue of NULL comparissons, but it has two drawbacks:

One has to think of a value which won’t appear in the data (with numbers a negative can be used, but not always).
The value has to change depending on the datatype.

Multiple Checks

SELECT [ID],
       [val_1],
       [val_2]
  FROM [value_list]
 WHERE(   [val_1] <> [val_2]
       OR([val_1] IS     NULL AND [val_2] IS NOT NULL)
       OR([val_1] IS NOT NULL AND [val_2] IS     NULL)
      )
;

/*
ID  , val_1 , val_2
3   , 3     , NULL
5   , NULL  , 5
6   , 6     , 7
*/

The above is my preferred method, it’s longer in code than using COALESCE but it’s the same code for all datatypes and explicit in all cases. It also avoids the challenge of thinking of a value that is (in theory) impossible to appear in the actual data.

See working solution on rextester.com

The Dream Solution

My dream (which doesn’t exist) is a modifier for statements, which would allow NULL to behave like a distinct value:

-- Different values
SELECT *
  FROM t_1
 WHERE c_1 <x> c_2
;


-- Equality
SELECT *
  FROM t_1
 WHERE c_1 x= c_2
;

2 Replies to “Managing update differences – NULL and NOT NULL”

Pingback: Warning: Null value is eliminated by an aggregate or other SET operation - Rows Across The Lake
Pingback: NULL: When a value is neither IN or NOT IN a list. - Rows Across The Lake