Zum Hauptinhalt springen

Use Vantage from a Jupyter notebook

note

This how-to shows you how to add Teradata Extensions to a Jupyter Notebooks environment. A hosted version of Jupyter Notebooks integrated with Teradata Extensions and analytics tools is available for functional testing for free at https://clearscape.teradata.com.

Overview

In this how-to we will go through the steps for connecting to Teradata Vantage from a Jupyter notebook.

note

If you need a test instance of Vantage, you can provision one for free at https://clearscape.teradata.com

Options

There are a couple of ways to connect to Vantage from a Jupyter Notebook:

  1. Use python or R libraries in a regular Python/R kernel notebook - this option works well when you are in a restricted environment that doesn't allow you to spawn your own Docker images. Also, it's useful in traditional datascience scenarios when you have to mix SQL and Python/R in a notebook. If you are proficient with Jupyter and have your own set of preferred libraries and extensions, start with this option.
  2. Use the Teradata Jupyter Docker image - the Teradata Jupyter Docker image bundles the Teradata SQL kernel (more on this later), teradataml and tdplyr libraries, python and R drivers. It also contains Jupyter extensions that allow you to manage Teradata connections, explore objects in Vantage database. It's convenient when you work a lot with SQL or would find a visual Navigator helpful. If you are new to Jupyter or if you prefer to get a currated assembly of libraries and extensions, start with this option.

Teradata libraries

This option uses a regular Jupyter Lab notebook. We will see how to load the Teradata Python driver and use it from Python code. We will also examine ipython-sql extension that adds support for SQL-only cells.

  1. We start with a plain Jupyter Lab notebook. Here, I'm using docker but any method of starting a notebook, including Jupyter Hub, Google Cloud AI Platform Notebooks, AWS SageMaker Notebooks, Azure ML Notebooks will do.
  1. Docker logs will display the url that you need to go to:
  1. We will open a new notebook and create a cell to install the required libraries:
note
  1. Now, we will import Pandas and define the connection string to connect to Teradata. Since I'm running my notebook in Docker on my local machine and I want to connect to a local Vantage Express VM, I'm using host.docker.internal DNS name provided by Docker to reference the IP of my machine.
  1. I can now call Pandas to query Vantage and move the result to a Pandas dataframe:
  1. The syntax above is concise but it can get tedious if all you need is to explore data in Vantage. We will use ipython-sql and its %%sql magic to create SQL-only cells. We start with importing the required libraries.
  1. We load ipython-sql and define the db connection string:
  1. We can now use %sql and %%sql magic. Let's say we want to explore data in a table. We can create a cell that says:
  1. If we want to move the data to a Pandas frame, we can say:

There are many other features that ipython-sql provides, including variable substitution, plotting with matplotlib, writting results to a local csv file or back to the database. See the demo notebook for examples and ipython-sql github repo for a complete reference.

Teradata Jupyter Docker image

The Teradata Jupyter Docker image builds on jupyter/datascience-notebook Docker image. It adds the Teradata SQL kernel, Teradata Python and R libraries, Jupyter extensions to make you productive while interacting with Teradata Vantage. The image also contains sample notebooks that demonstrate how to use the SQL kernel and Teradata libraries.

The SQL kernel and Teradata Jupyter extensions are useful for people that spend a lot of time with the SQL interface. Think about it as a notebook experience that, in many cases, is more convenient than using Teradata Studio. The Teradata Jupyter Docker image doesn't try to replace Teradata Studio. It doesn't have all the features. It's designed for people who need a lightweight, web-based interface and enjoy the notebook UI.

The Teradata Jupyter Docker image can be used when you want to run Jupyter locally or you have a place where you can run custom Jupyter docker images. The steps below demonstrate how to use the image locally.

  1. Run the image:
note

By passing -e "accept_license=Y you accept the license agreement for Teradata Jupyter Extensions.

  1. Docker logs will display the url that you need to go to. For example, this is what I've got:
  1. Open up the URL and use the file explorer to open the following notebook: jupyterextensions -> notebooks -> sql -> GettingStartedDemo.ipynb.
  2. Go through the demo of the Teradata SQL Kernel:

GettingStartedDemo.ipynb

Summary

This quick start covered different options to connect to Teradata Vantage from a Jupyter Notebook. We learned about the Teradata Jupyter Docker image that bundles multiple Teradata Python and R libraries. It also provides an SQL kernel, database object explorer and connection management. These features are useful when you spend a lot of time with the SQL interface. For more traditional data science scenarios, we explored the standalone Teradata Python driver and integration through the ipython sql extension.

Further reading

note

If you have any questions or need further assistance, please visit our community forum where you can get support and interact with other community members.

Also of interest