Data Types

Lakehouse supports numeric, string, time, boolean, binary, and complex types (ARRAY, MAP, STRUCT, JSON, VECTOR, BITMAP).

Type Quick Reference

Numeric Types

TypeStorageRange / PrecisionTypical Usage
TINYINT1 byte-128 ~ 127Status codes, enum values
SMALLINT2 bytes-32,768 ~ 32,767Small-range integers
INT4 bytes-2,147,483,648 ~ 2,147,483,647General integer IDs
BIGINT8 bytes-9.2×10¹⁸ ~ 9.2×10¹⁸Large integers, millisecond timestamps
FLOAT4 bytes~7 significant digitsApproximate floating point
DOUBLE8 bytes~15 significant digitsHigh-precision floating point
DECIMAL(p, s)VariableMax precision p=38, scale sMonetary amounts, exact calculations, e.g. DECIMAL(18, 4)

String Types

TypeMax LengthDescription
STRING16 MBNo length limit; recommended as the default
VARCHAR(n)n ≤ 1,048,576 charactersHas a length constraint; silently truncated on overflow
CHAR(n)n ≤ 255 charactersFixed length; no space padding

Time Types

TypeFormatDescription
DATEYYYY-MM-DDDate only, no time
TIMESTAMPYYYY-MM-DD HH:MM:SSWith local timezone (equivalent to TIMESTAMP_LTZ)
TIMESTAMP_NTZYYYY-MM-DD HH:MM:SSNo timezone information; stores the raw time value

Boolean and Binary Types

TypeDescription
BOOLEANtrue / false
BINARYFixed-length binary string

Complex Types

TypeSyntax ExampleDescription
ARRAYARRAY<INT>, ARRAY<STRING>Ordered collection of same-type elements
MAPMAP<STRING, INT>Key-value pairs; key type must be consistent
STRUCTSTRUCT<name:STRING, age:INT>Multi-field record; field types can differ
JSONJSONStores any JSON structure; supports dynamic field access
VECTORVECTOR(FLOAT, 1024)Fixed-dimension numeric vector; used for AI embedding storage
BITMAPBITMAPBitmap for efficient cardinality counting (UV calculation, etc.)

Complex type example:

CREATE TABLE example ( tags ARRAY<STRING>, attributes MAP<STRING, STRING>, address STRUCT<city:STRING, zip:STRING>, metadata JSON, embedding VECTOR(FLOAT, 1536), user_bitmap BITMAP );


Lakehouse supports multiple type aliases, primarily for compatibility with DDL and SQL scripts migrated from other databases (MySQL, PostgreSQL, Hive, etc.), so they can run directly without manually replacing type names.

AliasCanonical Type
BYTETINYINT
SHORTSMALLINT
INTEGERINT
LONGBIGINT
REALFLOAT
NUMERIC(p, s)DECIMAL(p, s)
TEXTSTRING
TIMESTAMPTIMESTAMP_LTZ

When aliases are converted and their visibility

Aliases are converted to canonical types immediately during SQL parsing; the system does not retain the original alias. Once converted, aliases and canonical types are indistinguishable everywhere:

  • DESCRIBE shows the canonical type name
  • SHOW CREATE TABLE outputs the canonical type name
  • typeof() returns the canonical type name
  • No system view or metadata records the original alias

This means: a column built with REAL shows float in DESCRIBE; a column built with INTEGER outputs int in SHOW CREATE TABLE. This is expected behavior — the type has not been changed.

Example: what the system actually shows after creating a table with aliases

Running a DDL migrated from MySQL directly:

CREATE TABLE orders ( id INTEGER, amount NUMERIC(18, 4), status TEXT, created TIMESTAMP );

DESCRIBE shows canonical type names, not the aliases used at creation:

DESCRIBE orders;

+---------+------------------+ | column | data_type | +---------+------------------+ | id | int | | amount | decimal(18,4) | | status | string | | created | timestamp_ltz | +---------+------------------+

SHOW CREATE TABLE also outputs canonical names:

SHOW CREATE TABLE orders; -- CREATE TABLE orders (id int, amount decimal(18,4), status string, created timestamp_ltz) ...

The table behaves exactly the same as if it had been created with canonical type names — the types have not been modified.

VARCHAR/CHAR and STRING Relationship

Lakehouse supports VARCHAR(n), CHAR(n), and STRING as three independent string types — they are not aliases of each other:

TypeCharacteristicstypeof() Returns
STRINGNo length limit, max 16 MBstring
VARCHAR(n)Max n characters (n ≤ 1048576)varchar(n)
CHAR(n)Max n characters (n ≤ 255), no space paddingchar(n)

When migrating from MySQL, VARCHAR(n) and CHAR(n) DDL can be used directly. Key behavioral differences from MySQL:

  • All three types do not pad with spaces: storing 'abc' in CHAR(10) returns 3 for LENGTH.
  • Strings exceeding the declared length are silently truncated — no error is raised.
  • Comparisons are based on actual content; there is no trailing-space-ignore logic.

If you do not need to preserve length constraints during migration, you can replace VARCHAR(n) with STRING uniformly to reduce truncation risk.

TIMESTAMP Has Timezone by Default

Lakehouse's TIMESTAMP is equivalent to TIMESTAMP_LTZ (local timezone). This is similar to MySQL's TIMESTAMP behavior, but differs from PostgreSQL's TIMESTAMP (no timezone).

When migrating from PostgreSQL, note:

SourceCorresponding Lakehouse Type
PostgreSQL TIMESTAMP (no timezone)TIMESTAMP_NTZ
PostgreSQL TIMESTAMPTZTIMESTAMP (i.e., TIMESTAMP_LTZ)
MySQL TIMESTAMPTIMESTAMP (i.e., TIMESTAMP_LTZ)
MySQL DATETIMETIMESTAMP_NTZ

Migrating PostgreSQL TIMESTAMP DDL directly to Lakehouse will result in columns becoming timezone-aware TIMESTAMP_LTZ. Write and read results will be affected by the session timezone, which is inconsistent with the original database behavior.

Lenient Mode: Type Errors Return NULL by Default

Lakehouse runs in lenient mode by default. Type conversion failures, numeric overflows, and similar errors do not throw exceptions — they return NULL. This differs from the default behavior of most databases:

ScenarioMost DatabasesLakehouse Default
CAST('abc' AS INT)ErrorNULL
CAST(200 AS TINYINT) (overflow)ErrorNULL
CAST('2023-02-29' AS DATE) (invalid date)ErrorNULL
Mixed-type conversion failure in UNION ALLErrorNULL

The benefit of lenient mode is that ETL pipelines are not interrupted by a single bad row; the trade-off is that errors are silently swallowed, making data quality issues harder to detect.

You can adjust this behavior:

-- Enable strict mode: conversion failures raise an error immediately SET cz.sql.cast.mode = strict; -- Or use TRY_CAST: returns NULL on failure (same as lenient mode, but intent is explicit) SELECT TRY_CAST('abc' AS INT);

When migrating from another database, if the original SQL relied on type errors to surface data issues, add explicit NULL checks at key ETL nodes after migrating to Lakehouse, or enable strict mode.

Type Reference

DocumentDescription
Data Type ConversionImplicit and explicit conversion rules between types
Type Conversion FunctionsCAST, TRY_CAST, and type conversion function reference
TIMESTAMP_NTZDetailed explanation and use cases for timezone-free timestamps
VECTOR TypeVector type syntax, dimension configuration, and usage examples
BITMAP TypeBitmap type use cases and function reference
Data Type Conversion GuideType mapping when migrating from MySQL / PostgreSQL / Hive
CREATE TABLEFull syntax for using data types when creating tables

Organizing Data Efficiently

DocumentDescription
Partitioning and BucketingPartition by time or business fields to improve query pruning efficiency
Bucketing (CLUSTERED BY)Hash-bucket by field to improve JOIN and aggregation performance
Primary KeysPrimary key table design; supports CDC real-time write deduplication
Recommended Sort ColumnsAccelerate range queries and filters via sort columns
Bloom Filter IndexAccelerate equality queries by skipping data files that don't contain the target value
Inverted IndexAccelerate full-text search; supports Chinese and English tokenization
Vector IndexANN approximate nearest neighbor search; suited for semantic search and RAG
Table Design Best PracticesComprehensive selection guidance for partitioning, bucketing, and indexing