Aller au contenu principal

Query data stored in object storage

Overview

Native Object Storage (NOS) is a Vantage feature that allows you to query data stored in files in object storage such as AWS S3, Google GCS, Azure Blob or on-prem implementations. It's useful in scenarios where you want to explore data without building a data pipeline to bring it into Vantage.

Prerequisites

You need access to a Teradata Vantage instance. NOS is enabled in all Vantage editions from Vantage Express through Developer, DYI to Vantage as a Service starting from version 17.10.

Remarque

If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com

Explore data with NOS

Remarque

Currently, NOS supports CSV, JSON (as array or new-line delimited), and Parquet data formats.

Let's say you have a dataset stored as CSV files in an S3 bucket. You want to explore the dataset before you decide if you want to bring it into Vantage. For this scenario, we are going to use a public dataset published by Teradata that contains river flow data collected by the U.S. Geological Survey. The bucket is at https://td-usgs-public.s3.amazonaws.com/.

Let's first have a look at sample CSV data. We take the first 10 rows that Vantage will fetch from the bucket:

Here is what I've got:

We have got plenty of numbers, but what do they mean? To answer this question, we will ask Vantage to detect the schema of the CSV files:

Vantage will now fetch a data sample to analyze the schema and return results:

We see that the CSV files have 6 columns. For each column, we get the name, the datatype and the file coordinates that were used to infer the schema.

Query data with NOS

Now that we know the schema, we can work with the dataset as if it was a regular SQL table. To prove the point, let's try to do some data aggregation. Let's get an average temperature per site for sites that collect temperatures.

Result:

To register your ad hoc exploratory activity as a permanent source, create it as a foreign table:

Result:

This time, the SELECT statement looks like a regular select against an in-database table. If you require subsecond response time when querying the data, there is an easy way to bring the CSV data into Vantage to speed things up. Read on to find out how.

Load data from NOS into Vantage

Querying object storage takes time. What if you decided that the data looks interesting and you want to do some more analysis with a solution that will you quicker answers? The good news is that data returned with NOS can be used as a source for CREATE TABLE statements. Assuming you have CREATE TABLE privilege, you will be able to run:

IMPORTANT: This query assumes you created database Riverflow and a foreign table called riverflow in the previous step.

Result:

This time, the SELECT query returned in less than a second. Vantage didn't have to fetch the data from NOS. Instead, it answered using data that was already on its nodes.

Access private buckets

So far, we have used a public bucket. What if you have a private bucket? How do you tell Vantage what credentials it should use?

It is possible to inline your credentials directly into your query:

Entering these credentials all the time can be tedious and less secure. In Vantage, you can create an authorization object that will serve as a container for your credentials:

You can then reference your authorization object when you create a foreign table:

Export data from Vantage to object storage

So far, we have talked about reading and importing data from object storage. Wouldn't it be nice if we had a way to use SQL to export data from Vantage to object storage? This is exactly what WRITE_NOS function is for. Let's say we want to export data from riverflow_native table to object storage. You can do so with the following query:

Here, we instruct Vantage to take data from riverflow_native and save it in YOUR-OBJECT-STORE-URI bucket using parquet format. The data will be split into files by site_no attribute. The files will be compressed.

Summary

In this quick start we have learned how to read data from object storage using Native Object Storage (NOS) functionality in Vantage. NOS supports reading and importing data stored in CSV, JSON and Parquet formats. NOS can also export data from Vantage to object storage.

Further reading

Remarque

If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members.

Également intéressant