Execute dbt teradata transformation jobs in Apache Airflow using Astronomer Cosmos library
Overview
This tutorial demonstrates how to install Apache Airflow on a local machine, configure the workflow to use dbt teradata to run dbt transformations using the astronomer cosmos library, and run it against a Teradata Vantage database. Apache Airflow is a task scheduling tool that is typically used to build data pipelines to process and load data. Astronomer cosmos library simplifies orchestrating dbt data transformations in Apache Airflow. Using Cosmos, allows running dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code. In this example, we will explain how to use astronomer cosmos to run dbt transformations in airflow against Teradata Vantage database.
Use The Windows Subsystem for Linux (WSL) on Windows
to try this quickstart example.
Prerequisites
- Access to a Teradata Vantage instance, version 17.10 or higher.
If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com
- Python 3.8, 3.9, 3.10 or 3.11 and python3-env, python3-pip installed.
- Linux
- WSL
- macOS
Run in Powershell:
Refer Installation Guide if you face any issues.
Install Apache Airflow and Astronomer Cosmos
-
Create a new python environment to manage airflow and its dependencies. Activate the environment:
노트This will install Apache Airflow as well.
-
Install the Apache Airflow Teradata provider
-
Set the AIRFLOW_HOME environment variable.
Install dbt
- Create a new python environment to manage dbt and its dependencies. Activate the environment:
- Install
dbt-teradata
anddbt-core
modules:
Setup dbt project
-
Clone the jaffle_shop repository and cd into the project directory:
-
Make a new folder, dbt, inside $AIRFLOW_HOME/dags folder. Then, copy/paste jaffle_shop dbt project into $AIRFLOW_HOME/dags/dbt directory