Aller au contenu principal

Run scripts on Vantage

Overview

Sometimes, you need to apply complex logic to your data that can't be easily expressed in SQL. One option is to wrap your logic in a User Defined Function (UDF). What if you already have this logic coded in a language that is not supported by UDF? Script Table Operator is a Vantage feature that allows you to bring your logic to the data and run it on Vantage. The advantage of this approach is that you don't have to retrieve data from Vantage to operate on it. Also, by running your data applications on Vantage, you leverage its parallel nature. You don't have to think how your applications will scale. You can let Vantage take care of it.

Prerequisites

You need access to a Teradata Vantage instance.

Remarque

If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com

Hello World

Let's start with something simple. What if you wanted the database to print "Hello World"?

Here is what I've got:

Let's analyze what just happened here. The SQL includes echo Hello World!. This is a Bash command. OK, so now we know how to run Bash commands. But why did we get 2 rows and not one? That's because our simple script was run once on each AMP and I happen to have 2 AMPs:

Returns:

This simple script demonstrates the idea behind the Script Table Operator (STO). You provide your script and the database runs it in parallel, once for each AMP. This is an attractive model in case you have transformation logic in a script and a lot of data to process. Normally, you would need to build concurrency into your application. By letting STO do it, you let Teradata select the right concurrency level for your data.

Supported languages

OK, so we did echo in Bash but Bash is hardly a productive environment to express complex logic. What other languages are supported then? The good news is that any binary that can run on Vantage nodes can be used in STO. Remember, that the binary and all its dependencies need to be installed on all your Vantage nodes. In practice, it means that your options will be limited to what your administrator is willing and able to maintain on your servers. Python is a very popular choice.

Uploading scripts

Ok, Hello World is super exciting, but what if we have existing logic in a large file. Surely, you don't want to paste your entire script and escape quotes in an SQL query. We solve the script upload issue with the User Installed Files (UIF) feature.

Say you have helloworld.py script with the following content:

Let's assume the script is on your local machine at /tmp/helloworld.py.

First, we need to setup permissions in Vantage. We are going to do this using a new database to keep it clean.

You can upload the script to Vantage using the following procedure call:

Now that the script has been uploaded, you can call it like this:

The last call should return:

That was a lot of work and we are still at Hello World. Let's try to pass some data into SCRIPT.

Passing data stored in Vantage to SCRIPT

So far, we have been using SCRIPT operator to run standalone scripts. But the main purpose to run scripts on Vantage is to process data that is in Vantage. Let's see how we can retrieve data from Vantage and pass it to SCRIPT.

We will start with creating a table with a few rows.

We will use the following script to parse out query parameters:

Note, how the scripts assumes that urls will be fed into stdin one by one, line by line. Also, note how it prints results line by line, using the tab character as a delimiter between values.

Let's install the script. Here, we assume that the script file is at /tmp/urlparser.py on our local machine:

With the script installed, we will now retrieve data from urls table and feed it into the script to retrieve query parameters:

As a result, we get query params and their values. There are as many rows as key/value pairs. Also, since we inserted a tab between the key and the value output in the script, we get 2 columns from STO.

Inserting SCRIPT output into a table

We have learned how to take data from Vantage, pass it to a script and get output. Is there an easy way to store this output in a table? Sure, there is. We can combine the select above with CREATE TABLE statement:

Now, let's inspect the contents of url_params table:

You should see the following output:

Summary

In this quick start we have learned how to run scripts against data in Vantage. We ran scripts using Script Table Operator (STO). The operator allows us to bring logic to the data. It offloads concurrency considerations to the database by running our scripts in parallel, one per AMP. All you need to do is provide a script and the database will execute it in parallel.

Further reading

Remarque

If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members.

Également intéressant