Singdata - Documents

MONOTONICALLY_INCREASING_ID

monotonically_increasing_id()

Description

Generates a monotonically increasing ID. This function generates a unique, monotonically increasing 64-bit integer ID for each row. In a distributed environment, the ID ranges generated for each partition do not overlap, ensuring global uniqueness.

Parameters

None

Returns

bigint type
Returns a monotonically increasing unique ID
IDs start from 0 and increase
In a distributed environment, IDs from different partitions do not overlap

Examples

SELECT a, monotonically_increasing_id() as id FROM VALUES (1), (2), (3), (4), (5) AS t(a); -- Result: -- 1 0 -- 2 1 -- 3 2 -- 4 3 -- 5 4

SELECT a, monotonically_increasing_id() as id FROM VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) AS t(a) DISTRIBUTE BY a; -- Result (IDs may be non-contiguous in a distributed environment; example for reference only): -- 1 8589934592 -- 2 17179869184 -- 3 0 -- ...

SELECT monotonically_increasing_id() as row_id, * FROM large_table;

SELECT monotonically_increasing_id() as seq, name, age FROM users ORDER BY age;

Notes

The IDs generated by monotonically_increasing_id have the following characteristics:
- Uniqueness: Each ID is unique within the result set
- Monotonicity: ID values are monotonically increasing (but not necessarily contiguous)
- Determinism: Under the same data distribution, the generated IDs are the same
In a distributed environment, ID generation works as follows:
- Each partition is assigned an ID range
- IDs within a partition increase contiguously
- ID ranges of different partitions do not overlap
- The high bits of the ID identify the partition, and the low bits identify the row number within the partition
ID composition (64-bit):
- High 33 bits: partition ID
- Low 31 bits: row number within the partition
- Therefore, each partition can have at most 231 - 1 rows
Differences from row_number():
- monotonically_increasing_id() is more efficient in a distributed environment
- row_number() requires global ordering, which has higher overhead
- monotonically_increasing_id() does not guarantee contiguity
Important notes:
- IDs may not start from 0 (in a distributed environment)
- There may be gaps between IDs
- Do not rely on the specific values of IDs; only rely on their uniqueness and monotonicity
- The same query may generate different IDs across different executions (if the data distribution changes)
Common use cases for this function:
- Generating unique identifiers for rows in a table
- Adding sequence numbers during data export
- Primary key generation for temporary tables
- Data partitioning and bucketing
If you need contiguous row numbers, use the row_number() window function instead