MONOTONICALLY_INCREASING_ID

monotonically_increasing_id()

Description

Generates a monotonically increasing ID. This function generates a unique, monotonically increasing 64-bit integer ID for each row. In a distributed environment, the ID ranges generated for each partition do not overlap, ensuring global uniqueness.

Parameters

  • None

Returns

  • bigint type
  • Returns a monotonically increasing unique ID
  • IDs start from 0 and increase
  • In a distributed environment, IDs from different partitions do not overlap

Examples

SELECT a, monotonically_increasing_id() as id FROM VALUES (1), (2), (3), (4), (5) AS t(a); -- Result: -- 1 0 -- 2 1 -- 3 2 -- 4 3 -- 5 4

SELECT a, monotonically_increasing_id() as id FROM VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10) AS t(a) DISTRIBUTE BY a; -- Result (IDs may be non-contiguous in a distributed environment; example for reference only): -- 1 8589934592 -- 2 17179869184 -- 3 0 -- ...

SELECT monotonically_increasing_id() as row_id, * FROM large_table;

SELECT monotonically_increasing_id() as seq, name, age FROM users ORDER BY age;

Notes

  • The IDs generated by monotonically_increasing_id have the following characteristics:
    • Uniqueness: Each ID is unique within the result set
    • Monotonicity: ID values are monotonically increasing (but not necessarily contiguous)
    • Determinism: Under the same data distribution, the generated IDs are the same
  • In a distributed environment, ID generation works as follows:
    • Each partition is assigned an ID range
    • IDs within a partition increase contiguously
    • ID ranges of different partitions do not overlap
    • The high bits of the ID identify the partition, and the low bits identify the row number within the partition
  • ID composition (64-bit):
    • High 33 bits: partition ID
    • Low 31 bits: row number within the partition
    • Therefore, each partition can have at most 231 - 1 rows
  • Differences from row_number():
    • monotonically_increasing_id() is more efficient in a distributed environment
    • row_number() requires global ordering, which has higher overhead
    • monotonically_increasing_id() does not guarantee contiguity
  • Important notes:
    • IDs may not start from 0 (in a distributed environment)
    • There may be gaps between IDs
    • Do not rely on the specific values of IDs; only rely on their uniqueness and monotonicity
    • The same query may generate different IDs across different executions (if the data distribution changes)
  • Common use cases for this function:
    • Generating unique identifiers for rows in a table
    • Adding sequence numbers during data export
    • Primary key generation for temporary tables
    • Data partitioning and bucketing
  • If you need contiguous row numbers, use the row_number() window function instead