Overview

Singdata Lakehouse is a next-generation cloud lakehouse independently developed by Singdata. Built on an incremental computing engine, it delivers up to 10x performance improvement over traditional open-source architectures like Spark, enabling full-chain, low-cost, real-time processing of massive data. The platform supports the integration, storage, and computation of all data types, providing solid data infrastructure for AI innovation and helping enterprises upgrade from traditional Spark systems to the AI era.

For enterprises with existing data lakes (OSS / S3 / COS), Singdata Lakehouse can directly mount existing object storage and federation-query data in Hive, Iceberg, Delta Lake, and other formats via External Catalog — no data migration required to gain high-performance SQL analytics. This is the lowest-cost path from a data lake to a unified lakehouse.

Supports seven global clouds, already available in multiple Asia-Pacific regions, with private deployment also supported. Infrastructure costs reduced to 1/5–1/3 of traditional solutions, with near-zero operational overhead.

Migrate from Spark / Databricks
Migration Guide · Migration Best Practices · SQL Syntax Comparison · Spark Connector · Performance Benchmarks

Accelerate on Existing Data Lake
On-Site Acceleration Implementation Guide · External Catalog Federation Query · External Tables · Object Storage Mount · Performance Benchmarks

AI Data Infrastructure
Lakehouse AI Overview · AI Data Preparation · Vector Search · AI Gateway · Data Analytics Agent · Data Engineering Agent

Cloud Platforms & Deployment
Supported Cloud Platforms and Regions

First Time Here?

①

Create Your Account
5 minutes

Register an account, activate a service instance, and complete initial setup

Get Started →

②

Quick Start Experience
30 minutes

Walk through data ingestion, SQL querying, and Dynamic Table incremental computing

Start Exploring →

③

Go Deeper by Role
On demand

Dedicated paths for data engineers, analysts, AI engineers, and administrators

Choose Your Path →

Who Are You and What Do You Want to Do?

Role / Scenario	Recommended Starting Point
Data Integration / Data Sync Data ingestion, CDC sync, file import, streaming writes	Studio Data Integration (visual config for 40+ data sources) · Real-Time Sync Tasks (MySQL / PG / Oracle full-database CDC) · Offline Sync Tasks (scheduled batch sync) · Pipe Continuous Ingestion (object storage / Kafka auto-write) · COPY INTO (one-time file import) · Complete Data Ingestion Guide
Data Engineer Build data pipelines, ETL processing, manage data warehouse layers	Dynamic Table Incremental Computing · Dynamic Table Overview · Real-Time Data Pipeline · Studio Task Development & Scheduling · DDL Syntax Reference · SQL Reference · cz-cli Command Line Tool · Data Engineering Agent · TPC-DS Benchmark
Data Analyst SQL queries, BI connections, ad-hoc analysis	Run Your First SQL Query · Connect BI Tools · Data Analytics Agent (natural language queries) · Semantic Views · SSB Benchmark · TPC-H Benchmark
AI / ML Engineer Vector search, RAG, AI functions, model invocation	AI Data Preparation · Vector Search · AI Functions (AI_COMPLETE / AI_EMBEDDING) · AI Gateway · Python SDK · ZettaPark (DataFrame API)
Platform Administrator User management, permissions, compute clusters, cost control	Account and Service Instance Setup · User and Permission Management · Compute Cluster Management
AI Agent / Automation Deterministic API calls, semantic layer queries, automated data pipelines	cz-cli Command Line Tool (deterministic interface, suitable for Agent calls) · Semantic Views (business semantic layer) · Python SDK · ZettaPark · Data Analytics Agent · Data Engineering Agent · Singclaw

Core Capabilities

Data Ingestion

40+ data sources ready out of the box: MySQL / PG / Oracle full-database CDC real-time sync, Kafka streaming writes, OSS / S3 / COS continuous file ingestion, COPY INTO one-time batch import.

Data Ingestion Guide · Studio Data Integration · Pipe · COPY INTO

Unified Lakehouse

No migration needed for existing data lakes (OSS / S3 / COS). Mount existing object storage directly and federation-query Hive, Iceberg, and Delta Lake format data via External Catalog to gain high-performance SQL analytics.

External Catalog · External Volume · On-Site Acceleration Guide

Incremental Computing

Define transformation logic in standard SQL. Dynamic Table automatically detects upstream changes and refreshes incrementally, replacing manual scheduling scripts to build low-latency data pipelines.

Incremental Computing · Dynamic Table Overview · Real-Time Data Pipeline

High-Performance SQL Analytics

Vectorized execution engine. Industry-leading performance on TPC-DS / TPC-H / SSB benchmarks. Supports OLAP multi-dimensional analysis and ad-hoc queries, up to 10x faster than traditional Spark architectures.

TPC Benchmark Reports · SQL Usage Guide

AI Native

Vector indexing, full-text search, AI functions (AI_COMPLETE / AI_EMBEDDING), and semantic views are built into the data platform. Build RAG knowledge bases and AI-enhanced analytics without external services. Data Analytics Agent supports natural language conversational data querying; Data Engineering Agent supports natural language ETL development.

Lakehouse AI Overview · Vector Search · AI Functions · Semantic Views · Data Analytics Agent · Data Engineering Agent

Studio & AI Agent Integration

Built-in IDE, task scheduling, data integration, data quality, and operational monitoring — a one-stop data development platform. cz-cli provides a deterministic command interface; semantic views provide a business semantic layer, enabling AI Agents to call data capabilities directly.

Studio User Manual · cz-cli Installation and Usage · Semantic Views

What's New

→ Product Updates

In This Section

Page	Description
Before You Begin	Ways to access Lakehouse: Studio, CLI, drivers and connectors
Account Signup and Setup	Register an account, activate a service instance, and complete initialization
Cloud Services and Regions	Supported cloud providers and available regions