Perform time series analysis using Teradata Vantage
Overview
Time series is series of data points indexed in time order. It is data continuously produced and collected by a wide range of applications and devices including but not limited to Internet of Things. Teradata Vantage offers various functionalities to simplify time series data analysis.
Prerequisites
You need access to a Teradata Vantage instance. Times series functionalities and NOS are enabled in all Vantage editions from Vantage Express through Developer, DYI to Vantage as a Service starting from version 17.10.
If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com
Import data sets from AWS S3 using Vantage NOS
Our sample data sets are available on S3 bucket and can be accessed from Vantage directly using Vantage NOS. Data is in CSV format and let's ingest them into Vantage for our time series analysis.
Let's have a look at the data first. Below query will fetch 10 rows from S3 bucket.
Here is what we've got:
Let's extract the complete data and bring it into Vantage for further analysis.
Result:
Vantage will now fetch the data from S3 and insert into trip table we just created.
Basic time series operations
Now that we are familiar with the data set, we can use Vantage capabilities to quickly analyse the data set. First, let's identify how many passengers are being picked up by hour in the month of November.
For further reading on GROUP BY TIME.
Result:
Yes, this can also be achieved by extracting the hour from time and then aggregating - it's additional code/work, but can be done without timeseries specific functionality.
But, now let's go a step further to identify how many passengers are being picked up and what is the average trip duration by vendor every 15 minutes in November.
Result:
This is the power of Vantage time series functionality. Without needing complicated, cumbersome logic we are able to find average trip duration by vendor every 15 minutes just by modifying the group by time clause. Let's now look at how simple it is to build moving averages based on this. First, let's start by creating a view as below.
Let's calculate a 2 hours moving average on our 15-minutes time series. 2 hour is 8 * 15 minutes periods.
Result:
In addition to above time series operations, Vantage also provides a special time series tables with Primary Time Index (PTI). These are regular Vantage tables with PTI defined rather than a Primary Index (PI). Though tables with PTI are not mandatory for time series functionality/operations, PTI optimizes how the time series data is stored physically and hence improves performance considerably compared to regular tables.
Summary
In this quick start we have learnt how easy it is to analyse time series datasets using Vantage's time series capabilities.
Further reading
- Teradata Vantage™ - Time Series Tables and Operations
- Query data stored in object storage
- Teradata Vantage™ - Native Object Store Getting Started Guide
If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members.