Warning: Null value is eliminated by an aggregate or other SET operation
In data warehousing where aggregation (GROUP BY) is one of the most common things done in queries, we often get the warning:
Warning: Null value is eliminated by an aggregate or other SET operation
In most cases what is happening is harmless. But what is happening?
NULL in aggregation
First we have to better understand NULL
, I’ve already looked at Managing update differences: NULL and NOT NULL. NULL
is both amazingly useful, and from time to time the thorn in the side of anyone working with databases.
Have a look at the below example:
WITH data AS ( SELECT CAST(1 AS INT) AS c1, CAST(1 AS INT) AS c2 UNION ALL SELECT CAST(1 AS INT) AS c1, CAST(2 AS INT) AS c2 UNION ALL SELECT CAST(1 AS INT) AS c1, CAST(NULL AS INT) AS c2 ) SELECT SUM(c1) AS [SUM_c1], COUNT(c1) AS [COUNT_c1], SUM(c2) AS [SUM_c2], -- Note: This causes the "Warning" COUNT(c2) AS [COUNT_c2], -- Note: This causes the "Warning" COUNT(*) AS [COUNT_all] -- Note: This gives a different result to above FROM data ;
SUM_c1 | COUNT_c1 | SUM_c2 | COUNT_c2 | COUNT_all |
3 | 3 | 3 | 2 | 3 |
Note: the warning comes from this line: SUM(c2)
The warning is telling us that the NULL
value is being eliminated (thrown out) by the aggregation. So in this instance the value might as well be 0
.
The danger of NULL in aggregation
NULL in COUNT
In the above example, the more interesting threat to processing and results is the COUNT(c2)
as when specifying a column with a NULL
, the value is discarded and no count is returned.
Note that COUNT(*)
is safe when trying to get a count of rows in an aggregation.
NULL in addition
NULL
in addition (as part of a row level expression) when added to another number returns NULL
, see this example:
SELECT 1 + 1 AS [c1], 1 + NULL AS [c2], -- The problem 1 + COALESCE(NULL, 0) AS [c3] -- The simple solution ;
c1 | c2 | c3 |
2 | NULL | 1 |
On a row by row basis we will see it quickly. But as shown with the SUM
above (generating the warning) when combined with an aggregation that NULL
can get easily lost:
WITH data AS ( SELECT 1 + 1 AS [c1], 1 + NULL AS [c2], -- The problem 1 + COALESCE(NULL, 0) AS [c3] -- The simple solution UNION ALL SELECT 1 AS [c1], 1 AS [c2], 1 AS [c3] ) SELECT SUM([c1]) AS [c1_sum], SUM([c2]) AS [c2_sum], -- This is incorrect SUM([c3]) AS [c3_sum] -- This is what we want FROM data ;
c1_sum | c2_sum | c3_sum |
3 | 1 | 2 |
In the above example, the value for c2_sum
is probably incorrect for what most people would want/expect as it’s not including the first value being added.
For c3
(and c3_sum
) the use of COALESCE
replaces the NULL
with 0
and returns the expected result.