Migration Guide

This section collects hands-on guides for migrating existing data systems to Singdata Lakehouse, covering the most common migration paths: Spark/PySpark, Snowflake, SQL syntax, and more.


Migration Path Overview

Source SystemRecommended PathDocumentation
Databricks / PySparkZettaPark DataFrame API replacementPySpark → ZettaPark Migration in Practice
PySpark RDD (legacy code)RDD → declarative DataFrame/SQLRDD → ZettaPark Migration in Practice
Spark SQLSQL syntax comparison migrationSpark SQL Syntax Migration Guide
Spark data engineering projectsArchitecture migration best practicesSpark Data Engineering Migration Best Practices
Spark jobs (production)Smooth migration with minimal changesSpark Job Smooth Migration Guide
SnowflakeETL Pipeline migrationSnowflake Real-Time ETL Migration
Build Medallion from scratchBronze → Silver → Gold modelingBuilding a Medallion Lakehouse from Scratch

Choosing a Migration Path

You have existing PySpark code and want to migrate it directly

Use the ZettaPark DataFrame API. 90% of your code can be reused as-is, with changes concentrated in 4 areas (import paths, Session creation, .collect(), file paths). See PySpark → ZettaPark Migration in Practice for complete before/after code comparisons and 4 migration notes.

You have RDD code (Spark 1.x legacy project) and want to migrate to Lakehouse

See RDD → ZettaPark Migration in Practice. The core change is moving from imperative (map/reduceByKey/aggregateByKey) to declarative (group_by/agg/F.avg()), resulting in less code and better execution efficiency. Replacing aggregateByKey with F.avg() yields the greatest reduction in code volume.

Starting from scratch and want to build a Medallion architecture on Lakehouse

See Building a Medallion Lakehouse from Scratch. The guide covers Bronze raw ingestion, Silver cleansing and deduplication, Gold dimensional modeling (including surrogate key generation), and a complete implementation with 22 automated validations.

Migrating only SQL, not touching the compute layer

See Spark SQL Syntax Migration Guide and Data Type Compatibility Reference.

Production Spark jobs requiring minimal downtime

See Spark Job Smooth Migration Guide, which covers dual-write validation, gradual traffic cutover, and other production migration strategies.