Skip to main content
Join our research panel and help shape the future of Teradata.Sign up.

Query data stored in object storage

Overview

Native Object Storage (NOS) is a Teradata feature that allows you to query data stored in files in object storage such as AWS S3, Google GCS, Azure Blob or on-prem implementations.

Prerequisites

You need access to a Teradata instance. NOS is enabled in all Teradata editions from Vantage Express to Teradata Cloud starting from version 17.10.

Note

If you need a test instance of Teradata, you can provision one for free at https://www.teradata.com/try

Explore data with NOS

Note

Currently, NOS supports CSV, JSON (as array or new-line delimited), and Parquet data formats.

Let's say you have a dataset stored as CSV files in an S3 bucket. You want to explore the dataset before you decide if you want to bring it into Teradata. For this scenario, we are going to use a public dataset published by Teradata that contains river flow data collected by the U.S. Geological Survey. The bucket is at https://td-usgs-public.s3.amazonaws.com/.

Note

NOS queries scan object storage remotely — the first query over a large bucket can take a minute or more.

Let's first have a look at sample CSV data. We take the first 10 rows that Teradata will fetch from the bucket:

Here is what we've got:

NOS always prepends a Location column showing the source file path from which each row was read.

We have got plenty of numbers, but what do they mean? To answer this question, we will ask Teradata to detect the schema of the CSV files:

Teradata will now fetch a data sample to analyze the schema and return results:

We see that the CSV files have 6 data columns. For each column, we get its position, name, datatype and the file coordinates that were used to infer the schema.

Query data with NOS

Now that we know the schema, we can work with the dataset as if it was a regular SQL table. To prove the point, let's try to do some data aggregation. Let's get an average flow rate per site for sites that measure flow.

Result:

To register your ad hoc exploratory activity as a permanent source, create it as a foreign table:

Result:

This time, the SELECT statement looks like a regular select against an in-database table. If you require subsecond response time when querying the data, there is an easy way to bring the CSV data into Teradata to speed things up. Read on to find out how.

Load data from NOS into Teradata

Querying object storage takes time. What if you decided that the data looks interesting and you want to do some more analysis with a solution that will give you quicker answers? The good news is that data returned with NOS can be used as a source for CREATE TABLE statements. Assuming you have CREATE TABLE privilege, you will be able to run:

important

This query assumes you created database Riverflow and a foreign table called riverflow in the previous step.

Result:

This time, the SELECT query returned in less than a second. Teradata didn't have to fetch the data from NOS. Instead, it answered using data that was already on its nodes.

Access private buckets

So far, we have used a public bucket. What if you have a private bucket? How do you tell Teradata what credentials it should use?

It is possible to inline your credentials directly into your query:

Entering these credentials all the time can be tedious and less secure. In Teradata, you can create an authorization object that will serve as a container for your credentials:

You can then reference your authorization object when you create a foreign table. First, drop the riverflow table created in the previous section, then recreate it with the authorization attached:

Export data from Teradata to object storage

So far, we have talked about reading and importing data from object storage. Wouldn't it be nice if we had a way to use SQL to export data from Teradata to object storage? This is exactly what WRITE_NOS function is for. Let's say we want to export data from riverflow_native table to object storage. You can do so with the following query:

The LOCATION value uses Teradata's NOS URI scheme: /s3/ for AWS S3, /gcs/ for Google Cloud Storage, and /az/ for Azure Blob Storage, followed by the bucket hostname and path. Replace your-bucket.s3.amazonaws.com/output-path/ with your actual bucket and destination prefix.

Here, we instruct Teradata to take data from riverflow_native and save it in the specified bucket using parquet format. The data will be split into files by site_no attribute. The files will be compressed.

Summary

In this quick start we have learned how to read data from object storage using Native Object Storage (NOS) functionality in Teradata. NOS supports reading and importing data stored in CSV, JSON and Parquet formats. NOS can also export data from Teradata to object storage.

Further reading