Compute Cluster Cache
Lakehouse uses caching technology to accelerate query performance and efficiency. The platform provides three types of caches to improve query performance:
-
Query Result Cache (ResultCache)
-
Metadata Cache (MetadataCache)
-
Compute Cluster Local Disk Cache (Virtual Cluster Local Disk Cache)

Among them:
Metadata Cache and Query Result Cache services belong to the service layer and can be shared within the workspace.
Compute Cluster Local Disk Cache is stored on the local nodes of the cluster and can only be used when using the specified Virtual Cluster.
In the storage-compute separation architecture of Lakehouse, data is stored in object storage. To address network request latency and improve response speed in analytical scenarios, we adopt caching strategies. Compute Cluster Cache stores frequently accessed data on local nodes, thereby accelerating queries.
Compute Cluster Cache is divided into two types:
- Active Cache: Manually cache tables to the compute cluster through commands. Each time the compute cluster starts, the latest data or partitions of these pre-cached tables are automatically loaded. Currently, only AP type clusters are supported. Suitable scenarios include BI report queries, which can significantly reduce query latency and improve data processing speed.
- Passive Cache: During the first query, Lakehouse automatically caches the read files to the compute cluster. Subsequent queries involving the same table files will directly utilize the cache, speeding up the query process. Supports both GP and AP type clusters. For the second and subsequent queries, if they involve the initially cached tables, the cache will be directly hit.
Usage
- Active Caching Method:
If you need to add a new table to the cache, you must include all previously configured tables, otherwise the original ones will be overwritten:
PRELOAD_TABLES supports wildcards. Use schema_name.* to cache all tables under a Schema:
Viewing Cache Status:
When tables are loaded into the compute cluster using the ALTER..PRELOAD_TABLES command, there may be a delay in the cache status update displayed by SHOW PRELOAD. However, the cached tables are actually already effective. Under normal circumstances, this delay is approximately 10 minutes.
- Display the preload table/partition status of the current Virtual Cluster:
- Display the preload table/partition status of the specified Virtual Cluster:
- Filter preload status information by table name:
- Display the preloaded cache summary information of the Virtual Cluster:
Notes
- Active caching (PRELOAD_TABLES) is only supported on Analytics (AP) clusters. General Purpose (GP) clusters do not support this feature.
PRELOAD_TABLESis an overwrite operation. When adding a new table, you must include all previously configured tables, otherwise they will be removed from preloading.- The cluster supports automatic start and stop. When the cluster stops, the local cache is automatically released. When an AP cluster restarts, only the most recently written data or partitions are cached.
- After executing the cache command, only newly written data will be cached.
SHOW PRELOADstatus updates may have approximately a 10-minute delay, but the cache is already effective.
