Overview

Singdata Lakehouse is a next-generation cloud lakehouse independently developed by Singdata. Built on an incremental computing engine, it delivers up to 10x performance improvement over traditional open-source architectures like Spark, enabling full-chain, low-cost, real-time processing of massive data. The platform supports the integration, storage, and computation of all data types, providing solid data infrastructure for AI innovation and helping enterprises upgrade from traditional Spark systems to the AI era.

For enterprises with existing data lakes (OSS / S3 / COS), Singdata Lakehouse can directly mount existing object storage and federation-query data in Hive, Iceberg, Delta Lake, and other formats via External Catalog — no data migration required to gain high-performance SQL analytics. This is the lowest-cost path from a data lake to a unified lakehouse.

Supports seven global clouds, already available in multiple Asia-Pacific regions, with private deployment also supported. Infrastructure costs reduced to 1/5–1/3 of traditional solutions, with near-zero operational overhead.


First Time Here?

Create Your Account
5 minutes

Register an account, activate a service instance, and complete initial setup

Get Started →
Quick Start Experience
30 minutes

Walk through data ingestion, SQL querying, and Dynamic Table incremental computing

Start Exploring →
Go Deeper by Role
On demand

Dedicated paths for data engineers, analysts, AI engineers, and administrators

Choose Your Path →

Who Are You and What Do You Want to Do?

Role / ScenarioRecommended Starting Point
Data Integration / Data Sync
Data ingestion, CDC sync, file import, streaming writes
Studio Data Integration (visual config for 40+ data sources) · Real-Time Sync Tasks (MySQL / PG / Oracle full-database CDC) · Offline Sync Tasks (scheduled batch sync) · Pipe Continuous Ingestion (object storage / Kafka auto-write) · COPY INTO (one-time file import) · Complete Data Ingestion Guide
Data Engineer
Build data pipelines, ETL processing, manage data warehouse layers
Dynamic Table Incremental Computing · Dynamic Table Overview · Real-Time Data Pipeline · Studio Task Development & Scheduling · DDL Syntax Reference · SQL Reference · cz-cli Command Line Tool · Data Engineering Agent · TPC-DS Benchmark
Data Analyst
SQL queries, BI connections, ad-hoc analysis
Run Your First SQL Query · Connect BI Tools · Data Analytics Agent (natural language queries) · Semantic Views · SSB Benchmark · TPC-H Benchmark
AI / ML Engineer
Vector search, RAG, AI functions, model invocation
AI Data Preparation · Vector Search · AI Functions (AI_COMPLETE / AI_EMBEDDING) · AI Gateway · Python SDK · ZettaPark (DataFrame API)
Platform Administrator
User management, permissions, compute clusters, cost control
Account and Service Instance Setup · User and Permission Management · Compute Cluster Management
AI Agent / Automation
Deterministic API calls, semantic layer queries, automated data pipelines
cz-cli Command Line Tool (deterministic interface, suitable for Agent calls) · Semantic Views (business semantic layer) · Python SDK · ZettaPark · Data Analytics Agent · Data Engineering Agent · Singclaw

Core Capabilities

Data Ingestion

40+ data sources ready out of the box: MySQL / PG / Oracle full-database CDC real-time sync, Kafka streaming writes, OSS / S3 / COS continuous file ingestion, COPY INTO one-time batch import.

Data Ingestion Guide · Studio Data Integration · Pipe · COPY INTO

Unified Lakehouse

No migration needed for existing data lakes (OSS / S3 / COS). Mount existing object storage directly and federation-query Hive, Iceberg, and Delta Lake format data via External Catalog to gain high-performance SQL analytics.

External Catalog · External Volume · On-Site Acceleration Guide

Incremental Computing

Define transformation logic in standard SQL. Dynamic Table automatically detects upstream changes and refreshes incrementally, replacing manual scheduling scripts to build low-latency data pipelines.

Incremental Computing · Dynamic Table Overview · Real-Time Data Pipeline

High-Performance SQL Analytics

Vectorized execution engine. Industry-leading performance on TPC-DS / TPC-H / SSB benchmarks. Supports OLAP multi-dimensional analysis and ad-hoc queries, up to 10x faster than traditional Spark architectures.

TPC Benchmark Reports · SQL Usage Guide

AI Native

Vector indexing, full-text search, AI functions (AI_COMPLETE / AI_EMBEDDING), and semantic views are built into the data platform. Build RAG knowledge bases and AI-enhanced analytics without external services. Data Analytics Agent supports natural language conversational data querying; Data Engineering Agent supports natural language ETL development.

Lakehouse AI Overview · Vector Search · AI Functions · Semantic Views · Data Analytics Agent · Data Engineering Agent

Studio & AI Agent Integration

Built-in IDE, task scheduling, data integration, data quality, and operational monitoring — a one-stop data development platform. cz-cli provides a deterministic command interface; semantic views provide a business semantic layer, enabling AI Agents to call data capabilities directly.

Studio User Manual · cz-cli Installation and Usage · Semantic Views


What's New

Product Updates


In This Section

PageDescription
Before You BeginWays to access Lakehouse: Studio, CLI, drivers and connectors
Account Signup and SetupRegister an account, activate a service instance, and complete initialization
Cloud Services and RegionsSupported cloud providers and available regions