Collect Set Function: COLLECT_SET
Description
The collect_set function collects and returns a set of unique elements from a given set of input data. This function ensures that the returned array contains no duplicate elements, making the result more concise and clear. When the DISTINCT keyword is specified, the function performs deduplication on the results. However, note that even without specifying DISTINCT, the collect_set function still performs deduplication, so the DISTINCT keyword does not affect the result in this scenario.
Parameters
expr: An expression of any data type.limit: Optional parameter, integer type, specifying the maximum number of elements to collect. If not specified, all elements are collected.
Return Results
- Returns an array (
ARRAY), where the element type matches the input parameter type. - Using the
DISTINCTkeyword does not affect the deduplication result. - If the
limitparameter is specified, the returned array contains at mostlimitelements. - The order of elements in the returned array is not guaranteed to match the input order.
- Input
NULLvalues are not included in the result array.
Examples
Example 1: Basic usage
Example 2: Deduplication
Example 3: Conditionally collect a set using the FILTER clause
Example 4: Combine FILTER clause and DISTINCT to collect a conditional set
Example 5: Use the limit parameter to restrict the number of returned elements
Notes
- When processing large amounts of data, note that the
collect_setfunction may consume significant memory resources. - Since the order of elements in the returned array is not guaranteed to match the input order, use other functions (such as
ARRAY_SORT) to sort the results if ordering is needed.
