Select the right data ingestion solution for Teradata Vantage
Overview
This article outlines different use cases involving data ingestion. It lists available solutions and recommends the optimal solution for each use case.
High-volume ingestion, including streaming
Available solutions:
- Use Teradata Parallel Transporter API
- Stream data to object storage and then ingest using Teradata Native Object Store (NOS).
- Use the Teradata Parallel Transporter (TPT) command line utility.
- Use Teradata Query Service - REST API to execute SQL statements in the database.
- Use Teradata database drivers such as JDBC (Java), teradatasql (Python), Node.js driver, ODBC, .NET Data Provider.
Teradata Parallel Transport API is usually the most performant solution which offers high throughput and minimum latency. Use it if you need to ingest tens of thousands of rows per second and if you are comfortable using C language.
Use the Teradata database drivers when the number of events is in thousands per second. Consider using the Fastload protocol that is available in the most popular drivers e.g. JDBC, Python.
If you don't want to manage the dependency on the driver libraries, use Query Service. Since Query Service uses the regular driver protocol to communicate to the database, the throughput of this solution is similar to the throughput offered by database drivers such as JDBC. If you are a vendor and are looking to integrate your product with Teradata, please be aware that not all Teradata customers have Query Service enabled in their sites.
If your solution can accept higher latency, a good option is to stream events to object storage and then read the data using NOS. This solution usually requires the least amount of effort.
Ingest data from object storage
Available solutions:
- Flow (VantageCloud Lake only)
- Teradata Native Object Store (NOS)
- Teradata Parallel Transporter (TPT)
Flow is the recommended ingestion mechanism to bring data from object storage to VantageCloud Lake. For all other Teradata Vantage editions, Teradata NOS is the recommended option. NOS can leverage all Teradata nodes to perform ingestion. Teradata Parallel Transporter (TPT) runs on the client side. It can be used when there is no connectivity from NOS to object storage.
Ingest data from local files
Available solutions:
TPT is the recommended option to load data from local files. TPT is optimized for scalability and parallelism, thus it has the best throughput of all available options. BTEQ can be used when an ingestion process requires scripting. It also makes sense to continue using BTEQ if all your other ingestion pipelines run in BTEQ.
Ingest data from SaaS applications
Available solutions:
- Multiple 3rd party tools such as Airbyte, Precog, Nexla, Fivetran
- Export from SaaS apps to local files and then ingest using Teradata Parallel Transporter (TPT)
- Export from SaaS apps to object storage and then ingest using Teradata Native Object Store (NOS).
3rd party tools are usually a better option to move data from SaaS applications to Teradata Vantage. They offer broad support for data sources and eliminate the need to manage intermediate steps such as exporting and storing exported datasets.