A Comprehensive Guide to Importing Data into Singdata Lakehouse

Data Ingestion: Loading Data via Zettapark using SAVE_AS_TABLE

Overview

Use Cases

The SAVE_AS_TABLE method automatically creates tables, simplifying the process of loading data via Zettapark using SQL INSERT, which requires manual table creation. Additionally, SAVE_AS_TABLE automatically optimizes INSERT INTO, inserting multiple records at once instead of one at a time.

Implementation Steps

Open VS Code on your computer, create a file named py_zettapark_save_as_table.py, and copy the following code into the py_zettapark_save_as_table.py file.

import json import gzip from clickzetta.zettapark.session import Session from datetime import datetime

Read parameters from the configuration file:

with open('config-ingest.json', 'r') as config_file: config = json.load(config_file) print("Connecting to Singdata Lakehouse.....\n")

Create session:

session = Session.builder.configs(config).create() print("Connection successful!...\n") target_table_name = "lift_tuckets_import_by_py_save_as_table" def save_as_table_to_clickzetta(session, schema, data): print('Saving data to Singdata Lakehouse') # Convert data to dataframe df = session.create_dataframe(data, schema=schema) # Save dataframe as table df.write.save_as_table(target_table_name, mode="overwrite", table_type="transient") print(f"Data saved to table {target_table_name}") if __name__ == "__main__": schema = None data = [] # Open the compressed JSON file and read the content with gzip.open('lift_tickets_data.json.gz', 'rt', encoding='utf-8') as file: for message in file: if message.strip(): # Ensure it's not an empty line record = json.loads(message) if 'schema' in record: schema = record['schema'] else: data.append(record) save_as_table_to_clickzetta(session, schema, data) session.close() print("Ingest complete")

In VS Code, open a new "Terminal" and run the following command to activate the Python environment created in the "Environment Setup" step. If you are already in the cz-ingest-examples environment, please skip this step.

conda activate cz-ingest-examples

Then run the following command in the same terminal:

python py_zettapark_save_as_table.py

Next Steps Recommendations

Resources

Zettapark Quick Start