Use geographic reference data with Vantage
Overview
This post demonstrates how you can leverage any geographic dataset in GeoJson format and use it for geospatial analytics in Teradata Vantage, with just a few lines of code.
Today we be gathering reference geographical data (official maps, points of interest, etc...) form public sources and use it to support our day to day analytics.
You will learn two methods to get your GeoJson data into Teradata Vantage:
- Load it as a single document and use native ClearScape analytics functions to parse it into a table usable for analytics.
- Lightly transform it in native Python as we load it into Vantage to produce an analytics ready dataset.
The first method is a straig forward ELT pattern for semi-structured format processing in Vantage with a single SQL statement, the second one involves some lightweight preparation in (pure) Python and may allow more flexibility (for example to add early quality checks or optimize the load of large documents).
Prerequisites
You will need:
- Access to a Teradata Vantage instance.
Note
If you need a test instance of Teradata, you can provision one for free at https://www.teradata.com/try
- A Python 3 interpreter
- A SQL Client
Option 1: Load a GeoJson document into Vantage
Here we will load a GeoJson document as a single Character Large OBject (CLOB) into the Vantage Data Store and use a single SQL statement, backed by ClearScape Analytics native functions, to parse this document into a usable structure for geospatial analytics.
Get and load the GeoJson document
The http://geojson.xyz/ website is a fantastic source for open geographical data in GeoJson format. We will load the "Populated Places" dataset that provides with a list of over 1000 significant world cities.
Open you favourite Python 3 interpreter and make sure you have the following packages installed:
- wget
- teradatasql
- getpass
Download and read the cities dataset:
Load the GeoJson document in Vantage
Modify this code as needed with your Vantage host name, user name and specify an advanced login mechanism if any (eg. LDAP, Kerberos...) with the logmech parameter if you need to. All the connection parameters are documented on the teradatasql PyPi page there: https://pypi.org/project/teradatasql/
The code below simply creates a Vantage connection, and opens a cursor creating a table and loading it with our file.
Use the map from Vantage
Now open your favourite SQL client and connect to your Vantage system.
We will use ClearScape analytics JSON functions to parse our GeoJson document and extract the most relevant properties and the geometry itself (the coordinates of the city) for each feature (each feature representing a city in this example). We then use the GeomFromGeoJSON function to cast our geometry as a native Vantage geometry data type (ST_GEOMETRY).
For user convenience, will wrap all this SQL code in a view:
That's all, you can now view the prepared geometry data as a table, ready to enrich your analytics:
Result:
| city_name | country_name | region_name | code_country_isoa3 | city_coord |
|---|---|---|---|---|
| Potenza | Italy | Basilicata | ITA | POINT (15.798996495640267 40.642002130098206) |
| Mariehamn | Finland | Finström | ALD | POINT (19.949004471869102 60.096996184895431) |
| Ramallah | Indeterminate | PSE | POINT (35.206209378189556 31.902944751424059) | |
| Poitier | French Republic | Poitou-Charentes | FRA | POINT (0.333276528534554 46.583292255736581) |
| Clermont-Ferrand | French Republic | Auvergne | FRA | POINT (3.080008095928406 45.779982115759424) |
Calculate the distance between two cities:
Result:
| city_coord.ST_SPHERICALDISTANCE(city_coord) |
|---|
| 1.9265006861079421e+06 |