Query data stored in object storage
Overview
Native Object Storage (NOS) is a Vantage feature that allows you to query data stored in files in object storage such as AWS S3, Google GCS, Azure Blob or on-prem implementations. It's useful in scenarios where you want to explore data without building a data pipeline to bring it into Vantage.
Prerequisites
You need access to a Teradata Vantage instance. NOS is enabled in all Vantage editions from Vantage Express through Developer, DYI to Vantage as a Service starting from version 17.10.
If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com
Explore data with NOS
Currently, NOS supports CSV, JSON (as array or new-line delimited), and Parquet data formats.
Let's say you have a dataset stored as CSV files in an S3 bucket. You want to explore the dataset before you decide if you want to bring it into Vantage. For this scenario, we are going to use a public dataset published by Teradata that contains river flow data collected by the U.S. Geological Survey. The bucket is at https://td-usgs-public.s3.amazonaws.com/.
Let's first have a look at sample CSV data. We take the first 10 rows that Vantage will fetch from the bucket:
Here is what I've got:
We have got plenty of numbers, but what do they mean? To answer this question, we will ask Vantage to detect the schema of the CSV files:
Vantage will now fetch a data sample to analyze the schema and return results:
We see that the CSV files have 6 columns. For each column, we get the name, the datatype and the file coordinates that were used to infer the schema.
Query data with NOS
Now that we know the schema, we can work with the dataset as if it was a regular SQL table. To prove the point, let's try to do some data aggregation. Let's get an average temperature per site for sites that collect temperatures.
Result:
To register your ad hoc exploratory activity as a permanent source, create it as a foreign table:
Result:
This time, the SELECT
statement looks like a regular select against an in-database table. If you require subsecond response time when querying the data, there is an easy way to bring the CSV data into Vantage to speed things up. Read on to find out how.
Load data from NOS into Vantage
Querying object storage takes time. What if you decided that the data looks interesting and you want to do some more analysis with a solution that will you quicker answers? The good news is that data returned with NOS can be used as a source for CREATE TABLE
statements. Assuming you have CREATE TABLE
privilege, you will be able to run:
IMPORTANT: This query assumes you created database Riverflow
and a foreign table called riverflow
in the previous step.
Result:
This time, the SELECT
query returned in less than a second. Vantage didn't have to fetch the data from NOS. Instead, it answered using data that was already on its nodes.
Access private buckets
So far, we have used a public bucket. What if you have a private bucket? How do you tell Vantage what credentials it should use?
It is possible to inline your credentials directly into your query:
Entering these credentials all the time can be tedious and less secure. In Vantage, you can create an authorization object that will serve as a container for your credentials:
You can then reference your authorization object when you create a foreign table:
Export data from Vantage to object storage
So far, we have talked about reading and importing data from object storage. Wouldn't it be nice if we had a way to use SQL to export data from Vantage to object storage? This is exactly what WRITE_NOS
function is for. Let's say we want to export data from riverflow_native
table to object storage. You can do so with the following query:
Here, we instruct Vantage to take data from riverflow_native
and save it in YOUR-OBJECT-STORE-URI
bucket using parquet
format. The data will be split into files by site_no
attribute. The files will be compressed.
Summary
In this quick start we have learned how to read data from object storage using Native Object Storage (NOS) functionality in Vantage. NOS supports reading and importing data stored in CSV, JSON and Parquet formats. NOS can also export data from Vantage to object storage.
Further reading
If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members.