Spatial Data Science with PostgreSQL/PostGIS

A practical guide to getting started with PostgreSQL and Python for Spatial Data.

 

Spatial is not special! If you ever had tried using spatial data with plain SQL, dealing geometries might not be straightforward. However, with Spatial SQL, it feels like dealing just another column in your table.

In this guide, we will go through how to set up and getting started with Spatial SQL. We will install g PostgreSQL, activate PostGIS extension and perform some simple Spatial SQL operations. Later in the article, we will learn how to incorporate Spatial SQL into your Python workflows.

Spatial SQL (PostGIS)

Photo by Dennis Kummer on Unsplash

Spatial SQL is processing, leveraging and performing spatial operations on spatial data. A Spatial Database is optimized for storing, manipulating and querying geographic data with defined geometries.

Although there are multiple spatial database providers out there, I usually prefer PostgresSQL/PostGIS. PostGIS is also widely used standing in the Geospatial database world — For example, Big Querry adopts it.

If you are new to the world of Database, you might ask the difference between PostgreSQL and PostGIS. We can think of them this way: Postgresql is the DatabaseDatabase while PostGIS is an extension that provides spatial functionality on top of PostgreSQL.

Installing PostgreSQL + PostGIS

Installing PostgreSQL is straightforward. You need to go ahead and select the Binary package for your distribution: Windows, macOS or Linux.

 

Once you download Download the Interactive installer by EDB, run the .exe file or follow the instructions given for your distribution. I have included (GIF instructions) here how to add PostGIS extension to your PostgreSQL. We are also installing PgAdmin4 as GUI interface for PostgreSQL.

 

Make sure you check the PgAdmin4 and Stackbuilder while following the interactive installation guide. This step will allow us later to use the PG Admin 4 User interface. Make sure also to check the PostGIS extension, which enables us to perform spatial operations.

There is a guide for the Interactive installer if you want step by step guide for your operating system.

PgAdmin 4 GUI Interface

Let us get familiar with PgAdmin GUI interface which we will use in this tutorial. The following GIF shows the interface. We are also creating a PostGIS extension, which brings forth the spatial functions we will use later.

Photo by Dennis Kummer on Unsplash

We can simply create a PostGIS extension with the following command implementing in PgAdmin Query tool.

CREATE EXTENSION postgis;

After running this command, we have plenty of new spatial functions available to our disposal.

Let us also create a database using PgAdmin4. The following GIF instructions show you how to create a new database.

Loading Data

Loading Geospatial data is different than loading CSV files into PostgreSQL. We need to preserve the Geometry column. There is, however, an easy way to upload spatial data using Shapefile Loader.

The following GIF instructions show how to load shapefiles to a PostgreSQL database.

Before uploading the data, you need to find out what is the Coordinate reference system (CRS) of the data. In this case, I already looked at the data with the help of QGIS, and I knew the projection code. But if you are not unsure, you can input 4326 which is the commonly used WGS 84 code.

After we load the data, we are set up to perform spatial data analysis with PostGIS. I also upload some other files into the DatabaseDatabase: streets, boundaries and London underground stations. We will be using them as

Basic Spatial Operations

Spatial SQL can make processing and analyzing geospatial data easy and can speed up also specific processes. There are hundreds of spatial functions readily available at your disposal.

Let us start with some simple examples. In a later post, I will delve deeper into Spatial SQL functions and processes. I have uploaded s

Example 1: Calculate the Area of buildings

SELECT objectid, 
       shapestare, 
       ROUND(ST_Area(geom)::numeric,2) AS AREA
FROM public.buildings

This simple SQL code returns all the buildings with new area column with two decimal precision. The Total query runtime: 659 msec. With 176023 rows affected.

We are using here theST_Area() PostGIS function.

Example 2: Calculate the Length of each Street type

SELECT streettype, Sum(ST_Length(geom)) AS length
FROM public.streets
GROUP BY 1
ORDER BY 2 DESC;

The above SQL code groups each street type and calculates the length of each type. We are using here theST_Length() PostGIS function.

Access PostGIS with Python

The good thing is that you can connect your PostGIS database easily in Python. Geopandas, which is the most used geospatial python library, can ingest and read SQL statements from PostGIS. Here is how you can connect your DatabaseDatabase and run Spatial SQL statements in Python.

import geopandas as gpd
import sqlalchemy as db
# Create a database connection
engine = db.create_engine(‘postgresql+psycopg2://postgres:postgres@localhost:5432/postgis_database’)
con = engine.connect()
# Read PostGIS database with Geopandas.
sql = “SELECT * FROM public.streets;”
data = gpd.read_postgis(sql=sql, con=con)

Bringing your PostGIS database into Python opens up other possibilities to explore the data and visualize it. Let us visualize the streets dataset we have read using Geopandas and Contextily base maps.

import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as ctx
fig, ax = plt.subplots(figsize=(18, 16))
data.to_crs(epsg=3857).plot(ax = ax,
 figsize=(20,18),
 color=”gray”,
 alpha=0.5,
 );
ctx.add_basemap(ax, url=ctx.providers.OpenStreetMap.Mapnik);

Conclusion

In this guide, we have installed PostgreSQL and activated the PostGIS extension. We have seen how to create a database using PgAdmin and also upload spatial data into the DatabaseDatabase. We have also seen some simple examples of Spatial SQL, and finally, we have integrated the spatial DatabaseDatabase into Python.

I am planning to write spatial data processing with Spatial SQL articles next. So stay tuned.

Leave a Reply

Your email address will not be published. Required fields are marked *