COUNT_DISTINCT Function
count_distinct(expr)
Description
The COUNT_DISTINCT function computes the number of distinct (deduplicated) values in a specified column. This is an aggregate function equivalent to COUNT(DISTINCT expr), though the optimizer may apply special optimizations in certain scenarios.
Parameters
expr: An expression of any type whose distinct values are to be counted.
Return Type
- Returns a
BIGINT value (non-null).
- Returns the number of distinct values.
Notes
- During computation,
NULL values are ignored and not included in the result.
- If all values are
NULL, returns 0.
- If the input is an empty set, returns 0.
COUNT_DISTINCT needs to maintain the set of all distinct values, which can incur significant memory overhead. For large datasets, if only an approximate value is needed, consider using APPROX_COUNT_DISTINCT.
Examples
- Basic usage: count the number of distinct values
SELECT count_distinct(city)
FROM VALUES ('Beijing'), ('Shanghai'), ('Beijing'), ('Shenzhen') AS t(city);
+------------------------+
| count_distinct(city) |
+------------------------+
| 3 |
+------------------------+
- Equivalent to COUNT(DISTINCT ...)
SELECT count_distinct(city), COUNT(DISTINCT city)
FROM VALUES ('Beijing'), ('Shanghai'), ('Beijing'), ('Shenzhen') AS t(city);
+------------------------+----------------------+
| count_distinct(city) | COUNT(DISTINCT city) |
+------------------------+----------------------+
| 3 | 3 |
+------------------------+----------------------+
- NULL values are ignored
SELECT count_distinct(value)
FROM VALUES (1), (2), (NULL), (1), (NULL), (3) AS t(value);
+-------------------------+
| count_distinct(value) |
+-------------------------+
| 3 |
+-------------------------+
- Returns 0 when all values are NULL
SELECT count_distinct(value)
FROM VALUES (NULL), (NULL), (NULL) AS t(value);
+-------------------------+
| count_distinct(value) |
+-------------------------+
| 0 |
+-------------------------+
- Count distinct values by group
SELECT dept, count_distinct(name) as unique_employees
FROM VALUES
('Sales', 'Alice'),
('Sales', 'Bob'),
('Sales', 'Alice'),
('IT', 'Charlie'),
('IT', 'David'),
('IT', 'Charlie')
AS employees(dept, name)
GROUP BY dept;
+-------+------------------+
| dept | unique_employees |
+-------+------------------+
| Sales | 2 |
| IT | 2 |
+-------+------------------+