Select the right data ingestion solution for Teradata Vantage
Overview
This article outlines different use cases involving data ingestion. It lists available solutions and recommends the optimal solution for each use case.
High-volume ingestion, including streaming
Available solutions:
- Use Teradata Parallel Transporter API
- Stream data to object storage and then ingest using Teradata Native Object Store (NOS).
- Use the Teradata Parallel Transporter (TPT) command line utility.
- Use Teradata Query Service - REST API to execute SQL statements in the database.
- Use Teradata database drivers such as JDBC (Java), teradatasql (Python), Node.js driver, ODBC, .NET Data Provider.
Teradata Parallel Transport API is usually the most performant solution which offers high throughput and minimum latency. Use it if you need to ingest tens of thousands of rows per second and if you are comfortable using C language.
Use the Teradata database drivers when the number of events is in thousands per second. Consider using the Fastload protocol that is available in the most popular drivers e.g. JDBC, Python.
If you don't want to manage the dependency on the driver libraries, use Query Service. Since Query Service uses the regular driver protocol to communicate to the database, the throughput of this solution is similar to the throughput offered by database drivers such as JDBC. If you are a vendor and are looking to integrate your product with Teradata, please be aware that not all Teradata customers have Query Service enabled in their sites.
If your solution can accept higher latency, a good option is to stream events to object storage and then read the data using NOS. This solution usually requires the least amount of effort.
Ingest data from object storage
Available solutions:
- Flow (VantageCloud Lake only)
- Teradata Native Object Store (NOS)
- Teradata Parallel Transporter (TPT)
Flow is the recommended ingestion mechanism to bring data from object storage to VantageCloud Lake. For all other Teradata Vantage editions, Teradata NOS is the recommended option. NOS can leverage all Teradata nodes to perform ingestion. Teradata Parallel Transporter (TPT) runs on the client side. It can be used when there is no connectivity from NOS to object storage.
Ingest data from local files
Available solutions:
TPT is the recommended option to load data from local files. TPT is optimized for scalability and parallelism, thus it has the best throughput of all available options. BTEQ can be used when an ingestion process requires scripting. It also makes sense to continue using BTEQ if all your other ingestion pipelines run in BTEQ.
Ingest data from SaaS applications
Available solutions:
- Multiple 3rd party tools such as Airbyte, Precog, Nexla, Fivetran
- Export from SaaS apps to local files and then ingest using Teradata Parallel Transporter (TPT)
- Export from SaaS apps to object storage and then ingest using Teradata Native Object Store (NOS).
3rd party tools are usually a better option to move data from SaaS applications to Teradata Vantage. They offer broad support for data sources and eliminate the need to manage intermediate steps such as exporting and storing exported datasets.
Use data stored in other databases for unified query processing
Available solutions:
- Teradata QueryGrid
- Export from other databases to local files and then ingest using Teradata Parallel Transporter (TPT)
- Export from other databases to object storage and then ingest using Teradata Native Object Store (NOS).
QueryGrid is the recommended option to move limited quantities of data between different systems/platforms. This includes movement within Vantage instances, Apache Spark, Oracle, Presto, etc. It is especially suited to situations when what needs to be synced is described by complex conditions that can be expressed in SQL.
Summary
In this article, we explored various data ingestion use cases, provided a list of available tools for each use case, and identified the recommended options for different scenarios.