Zum Hauptinhalt springen

Manage ELT pipelines as code with Terraform and Airbyte on Teradata Vantage

Overview

This quickstart explains how to use Terraform to manage Airbyte data pipelines as code. Instead of manual configurations through the WebUI, we'll use code to create and manage Airbyte resources. The provided example illustrates a basic ELT pipeline from Google Sheets to Teradata Vantage using Airbyte's Terraform provider.

The Airbyte Terraform provider is available for users on Airbyte Cloud, OSS & Self-Managed Enterprise.

Watch this concise explanation of how this integration works:

Introduction

Terraform is a leading open-source tool in the Infrastructure as Code (IaC) space. It enables the automated provisioning and management of infrastructure, cloud platforms, and services via configuration files, instead of manual setup. Terraform uses plugins, known as Terraform providers, to communicate with infrastructure hosts, cloud providers, APIs, and SaaS platforms.

Airbyte, the data integration platform, has a Terraform provider that communicates directly with Airbyte's API. This allows data engineers to manage Airbyte configurations, enforce version control, and apply good data engineering practices within their ELT pipelines.

Prerequisites

  • Teradata Vantage Instance. You will need a database Host, Username, and Password for Airbyte’s Terraform configuration.

Install Terraform

  • Apply the respective commands to install Terraform on your Operating System. Find additional options on the Terraform site.

First, install the HashiCorp tap, a repository of all Homebrew packages.

Next, install Terraform with hashicorp/tap/terraform.

Environment preparation

Prepare the environment by creating a directory for the Terraform configuration and initialize two files: main.tf and variables.tf.

Define a data pipeline

Define the data source, destination and connection within the main.tf file. Open the newly created main.tf file in Visual Studio Code or any preferred code editor.

Terraform Extensions on Visual Studio Code

Populate the main.tf file with the template provided.

Note that this example uses a cron expression to schedule the data transfer to run every 15 minutes past the hour.

In our main.tf file we reference variables which are held in the variables.tf file, including the API key, workspace ID, Google Sheet id, Google private key and Teradata Vantage credentials. Copy the following template into the variables.tf file and populate with the appropriate configuration values in the default attribute.

Configuring the variables.tf file

Execution Commands

Run terraform init pull down provider plugin from terraform provider page and initialize a working Terraform directory.

This command should only be run after writing a new Terraform configuration or cloning an existing one from version control.

Initialize Terraform with Terraform init command

Run terraform plan to display the execution plan Terraform will use to create resource and make modifications to infrastructure.

For this example a plan for 3 new resources is created:

Connection: # airbyte_connection.googlesheets_teradata will be created

Destination: # airbyte_connection.googlesheets_teradata will be created

Source: # airbyte_source_google_sheets.my_source_gsheets will be created

View Terraform execution plan with terraform plan command

Run terraform apply and yes to generate a plan and carry out the plan.

Apply the Terraform plan with terraform apply command

The terraform.tfstate file is created after running terraform apply for the first time. This file tracks the status of all sources, destinations, and connections managed by Terraform. For subsequent executions of Terraform apply, Terraform compares the code in the main.tf file with the code stored in the tfstate file. If resources are added or removed in main.tf, Terraform automatically updates both deployment and the .tfstate file accordingly upon deployment. Do not modify this file by hand.

You now have a Source, Destination and Connection on Airbyte Cloud created and managed via Terraform.

Airbyte Connection in Airbyte Cloud UI

Additional Resources

Auch interessant