Execute Airflow workflows that use dbt with Teradata Vantage
Overview
This tutorial demonstrates how to install Airflow on an AWS EC2 VM, configure the workflow to use dbt, and run it against a Teradata Vantage database. Airflow is a task scheduling tool that is typically used to build data pipelines to process and load data. In this example, we go through the Airflow installation process, which creates a Docker-based Airflow environment. Once Airflow is installed, we run several Airflow DAG (Direct Acyclic Graph, or simply workflow) examples that load data into a Teradata Vantage database.
Prerequsites
- Access to AWS (Amazon Web Services) with permissions to create a VM.
This tutorial can be adjusted to other compute platforms or even on a bare metal machine as long as it has a computing and storage capacity comparable to the machine mentioned in this document (t2.2xlarge EC2 on AWS with approximately 100GB of storage) and is connected to the internet. If you decide to use a different compute platform, some steps in the tutorial will have to be altered.
- An SSH client.
- Access to a Teradata Vantage database. If you don't have access to Teradata Vantage, explore Vantage Express - a free edition for developers.
Install and execute Airflow
Create a VM
- Go to the AWS EC2 console and click on
Launch instance
. - Select
Red Hat
for OS image. - Select
t2.2xlarge
for instance type. - Create a new key pair or use an existing one.
- Apply network settings that will allow you ssh to the server and the server will have outbound connectivity to the Internet. Usually, applying the default settings will do.
- Assign 100GB of storage.
Install Python
-
ssh to the machine using
ec2-user
user. -
Check if python is installed (should be Python 3.7 or higher). Type
python
orpython3
on the command line. -
If python is not installed (you are getting
command not found
message) run the commands below to install it. The commands may require you to confirm the installation by typingy
and enter.
Create an Airflow environment
- Create the Airflow directory structure (from the ec2-user home directory /home/ec2-user)
- Use your preferred file transfer tool (
scp
,PuTTY
,MobaXterm
, or similar) to upload airflow.cfg file toairflow/config
directory.
Install Docker
Docker is a containerization tool that allows us to install Airflow in a containerized environment.
The steps must be executed in airflow
directory.
- Uninstall podman (RHEL containerization tool)
- Install yum utilities
- Add docker to yum repository.
- Install docker.
- Start docker as a service. The first command runs the docker service automatically when the system starts up next time. The second command starts Docker now.
- Check if Docker is installed correctly. This command should return an empty list of containers (since we have not started any container yet):