Data Lake Storage Management: Volume

Overview

Lakehouse Volume is an object in Singdata Lakehouse that represents an object storage location. It provides access to object storage, storage, management, and organization of files, and can be used to store and access files in various formats, including structured, semi-structured, and unstructured data. Volumes can be organized and managed under a Lakehouse Schema just like tables, views, and other objects.

Using the Volume feature brings the following benefits:

  • Unified Data Analysis: Supports calling AI workloads in Singdata Lakehouse to process images, PDFs, and specially formatted unstructured data in object storage, enabling unified processing and analysis together with structured data on the platform.
  • Unified Permission Management: Supports using the Singdata Lakehouse platform's permission system for unified permission management of databases, tables, and files in object storage.
  • Unified Data Governance: Data in object storage is uniformly managed and governed by the Singdata Lakehouse platform.

Volume Types

Lakehouse Volumes are classified into the following four types by creation method:

TypeCreation MethodStorage LocationDescription
User VolumeAuto-createdInternal storageA user-specific personal storage space, each user has one by default
Table VolumeAuto-createdInternal storageThe file storage area associated with each table by default, permissions consistent with the table
External VolumeCREATE EXTERNAL VOLUMEExternal storage (OSS/COS/S3)Mounts external object storage, treating object storage as a data lake
Named VolumeCREATE VOLUMEInternal or external storageA Volume explicitly created by the user for cross-team resource sharing

Type Comparison

FeatureUser VolumeTable VolumeExternal VolumeNamed Volume
Creation MethodAuto-createdAuto-created (one per table)CREATE EXTERNAL VOLUMECREATE VOLUME
Storage LocationInternal storageInternal storageExternal storage (OSS/COS/S3)Internal or external storage
Permission ManagementUser owns by defaultConsistent with table permissionsRequires separate authorizationRequires separate authorization
Storage CostLakehouse storage billingLakehouse storage billingCloud provider storage billingLakehouse or cloud provider billing
Typical ScenarioUpload local files, RAG knowledge baseTable-associated ETL files, batch import/exportMount existing object storageCross-team resource sharing

Data Operation Protocols

Volumes support three address formats for referencing files in different scenarios:

Protocol TypeAddress FormatTypical Scenario
External/Named Volumevolume://volume_name/path_to_fileCross-team resource sharing
User Volumevolume:user://~/path_to_fileUser's personal space
Table Volumevolume:table://table_name/path_to_fileTable-associated ETL files

Address Format Details

External/Named Volume Format: volume://volume_name/upper.jar

  • volume_name: The name of the created Volume
  • upper.jar: The target file name

User Volume Format: volume:user://~/upper.jar

  • user: Indicates use of the User Volume protocol
  • ~: Represents the current user, a fixed value
  • upper.jar: The target file name

Table Volume Format: volume:table://table_name/upper.jar

  • table: Indicates use of the Table Volume protocol
  • table_name: The table name, filled in according to the actual situation
  • upper.jar: The target file name

DDL Operations

Supported commands for different Volume types:

CommandDescriptionUser VolumeTable VolumeExternal/Named Volume
CREATE VOLUMECreate a Named VolumeNoNoYes
CREATE EXTERNAL VOLUMECreate an External VolumeNoNoYes
DROP VOLUMEDrop a VolumeNoNoYes
DESC VOLUMEDescribe Volume propertiesNoNoYes
SHOW VOLUMESList created VolumesNoNoYes
SHOW USER VOLUME DIRECTORYList User Volume filesYesNoNo
SHOW TABLE VOLUME DIRECTORYList Table Volume filesNoYesNo
SHOW VOLUME DIRECTORYList External/Named Volume filesNoNoYes
REMOVEDelete files from a VolumeYesYesYes
PUTUpload files to a VolumeYesYesYes
GETDownload files from a VolumeYesYesYes

Permissions

PermissionDescription
READ METADATAPermission to view Volume object metadata
READ VOLUMEPermission to read files and directories under the Volume object. Required when viewing the file list under a Volume, reading Volume files via SQL, and downloading files via the GET command
WRITE VOLUMEPermission to write data to a Volume. Required when uploading files via the PUT command and deleting files via the REMOVE command
ALTER VOLUMEPermission required for the ALTER VOLUME command. For example: ALTER VOLUME <volume_name> REFRESH to refresh the file metadata information under the Volume (External Volume only)
ALLAll permissions for the Volume object

Cost

  • External Volume: No additional storage cost on the Lakehouse side; storage costs are charged according to the cloud provider's standard rates.
  • Named Volume (internal storage): Lakehouse storage fees are charged based on the actual storage size.
  • User Volume / Table Volume: Lakehouse storage fees are charged based on the actual storage size.

Constraints and Limitations

  • The size of a single uploaded file must not exceed 5 GB.
  • JDBC driver version 1.4.4 or above is required to support local PUT/GET interfaces.
  • External Volume does not support cross-cloud-provider creation: Alibaba Cloud instances can only create OSS Connections, and Tencent Cloud instances can only create COS Connections.