Create Parquet files in object storage
Overview
Native Object Storage (NOS) is a Vantage feature that allows you to query data stored in files such as CSV, JSON, and Parquet format datasets. These datasets are located on external S3-compatible object storage such as AWS S3, Google GCS, Azure Blob or on-prem implementations. It's useful in scenarios where you want to explore data without building a data pipeline to bring it into Vantage. This tutorial demonstrates how to export data from Vantage to object storage using the Parquet file format.
Prerequisites
You need access to a Teradata Vantage instance. NOS is enabled in all Vantage editions from Vantage Express through Developer, DYI to Vantage as a Service starting from version 17.10.
This tutorial is based on s3 aws object storage. You will need your own s3 bucket with write permissions to complete the tutorial.
You can now get a hosted instance of Vantage for free at https://clearscape.teradata.com.
Create a Parquet file with WRITE_NOS function
WRITE_NOS
allows you to extract selected or all columns from a database table or from derived results and write to external object storage, such as Amazon S3, Azure Blob storage, Azure Data Lake Storage Gen2, and Google Cloud Storage. This functionality stores data in Parquet format.
You can find more documentation about WRITE_NOS
functionality in the NOS documentation.
You will need access to a database where you can execute WRITE_NOS
function. If you don't have such a database, run the following commands:
If you would like to learn more about setting up users and their privileges, checkout the NOS documentation.
- Let's first create a table on your Teradata Vantage instance:
- Populate your table with example data:
Your table should now look like this:
- Create the parquet file with
WRITE_NOS
. Don't forget to replace<BUCKET_NAME>
with the name of your s3 bucket. Also,replace<YOUR-ACCESS-KEY-ID>
and<YOUR-SECRET-ACCESS-KEY>
with your access key and secret.
Check your cloud provider docs how to create credentials to access object storage. For example, for AWS check out How do I create an AWS access key?
Now you have created a parquet file in your object storage bucket. Now to easily query your file you need to follow step number 4.
- Create a NOS-backed foreign table. Don't forget to replace
<BUCKET_NAME>
with the name of your s3 bucket. Also,replace<YOUR-ACCESS-KEY-ID>
and<YOUR-SECRET-ACCESS-KEY>
with your access key and secret:
- Now you are ready to Query your parquet file on NOS, let's try the following query:
The data returned from the query should look something like this:
Summary
In this tutorial we have learned how to export data from Vantage to a parquet file on object storage using Native Object Storage (NOS). NOS supports reading and importing data stored in CSV, JSON and Parquet formats. NOS can also export data from Vantage to object storage.
Further reading
If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members.