Data Transfer from Azure Blob to Teradata Vantage Using dagster-teradata
Overview
This document provides instructions and guidance for transferring data in CSV, JSON and Parquet formats from Microsoft Azure Blob Storage to Teradata Vantage using dagster-teradata. It outlines the setup, configuration and execution steps required to establish a seamless data transfer pipeline between these platforms.
Prerequisites
-
Access to a Teradata Vantage instance.
HinweisIf you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com
-
Python 3.9 or higher, Python 3.12 is recommended.
-
pip
Setting Up a Virtual Enviroment
A virtual environment is recommended to isolate project dependencies and avoid conflicts with system-wide Python packages. Here’s how to set it up:
- Windows
- MacOS
- Linux
Run in Powershell:
Install dagster and dagster-teradata
With your virtual environment active, the next step is to install dagster and the Teradata provider package (dagster-teradata) to interact with Teradata Vantage.
-
Install the Required Packages:
-
Verify the Installation:
To confirm that Dagster is correctly installed, run:
If installed correctly, it should show the version of Dagster.
Initialize a Dagster Project
Now that you have the necessary packages installed, the next step is to create a new Dagster project.
Scaffold a New Dagster Project
Run the following command:
This command will create a new project named dagster-teradata-azure. It will automatically generate the following directory structure:
Refer here to know more above this directory structure
You need to modify the definitions.py
file inside the jaffle_dagster/jaffle_dagster
directory.
Step 1: Open definitions.py
in dagster-teradata-azure/dagster-teradata-azure
Directory
Locate and open the file where Dagster job definitions are configured.
This file manages resources, jobs, and assets needed for the Dagster project.