Skip to main content
Join our research panel and help shape the future of Teradata.Sign up.

Perform time series analysis using Teradata

Overview

Time series is series of data points indexed in time order. It is data continuously produced and collected by a wide range of applications and devices including but not limited to Internet of Things. Teradata offers various functionalities to simplify time series data analysis.

Prerequisites

You need access to a Teradata instance. Time series functionalities and NOS are enabled in Teradata starting from version 17.10.

Note

If you need a test instance of Teradata, you can provision one for free at https://www.teradata.com/try

Import data sets from GCS using Native Object Store (NOS)

Our sample data set is available in a Google Cloud Storage bucket and can be accessed from Teradata directly using Native Object Store (NOS). The data is in CSV format; let's ingest it into Teradata for our time series analysis.

First, create an authorization object (an empty one is sufficient for public buckets):

Let's have a look at the data first. The query below fetches 10 rows from the GCS bucket.

Here is what we've got:

Let's extract the complete data and bring it into Teradata for further analysis. Because the CSV stores timestamps without a leading zero for single-digit hours, we use a staging table and an explicit CAST to load correctly typed data.

Result:

Teradata has fetched the data from GCS and loaded it into the trip table we just created.

Basic time series operations

Now that we are familiar with the data set, we can use Teradata capabilities to quickly analyse the data set. First, let's identify how many passengers are being picked up by hour in the month of November.

Result:

Yes, this can also be achieved by extracting the hour from time and then aggregating - it's additional code/work, but can be done without timeseries specific functionality.

But, now let's go a step further to identify how many passengers are being picked up and what is the average trip duration by vendor every 15 minutes in November.

Result:

This is the power of Teradata time series functionality. Without needing complicated, cumbersome logic we are able to find average trip duration by vendor every 15 minutes just by modifying the group by time clause. Let's now look at how simple it is to build moving averages based on this. First, let's start by creating a view as below.

Let's calculate a 2 hours moving average on our 15-minutes time series. 2 hour is 8 * 15 minutes periods.

Result:

Note

In addition to above time series operations, Teradata also provides special time series tables with Primary Time Index (PTI). These are regular Teradata tables with PTI defined rather than a Primary Index (PI). Though tables with PTI are not mandatory for time series functionality/operations, PTI optimizes how the time series data is stored physically and hence improves performance considerably compared to regular tables.

Summary

In this quick start we have learnt how easy it is to analyse time series datasets using Teradata's time series capabilities.

Further reading