GDAL/OGR - Automated Geodata Processing

GDAL/OGR - Automated Geodata Processing

This blogpost gives in an introduction to GDAL/OGR and explains how the various command line tools can be used.

/img/gdal_ogr/gdal_icon.png

GDAL/OGR is a library that can read many different vector and raster formats. It is written in C/C++ and is completely Open Source. It can also be accessed with other programming languages like Python or R. There are numerous programs for the command line which will be described in more detail in this blog post.

Most have already used GDAL/OGR many times without noticing it. Many OpenSource programs like QGIS, PostGIS or MapServer and also some proprietary programs use GDAL/OGR in the background to read and write geodata.

With GDAL/OGR it is also possible to do some spatial analyses. However, it is mainly used for the conversion of data formats. Internally GDAL/OGR converts the imported data set into its own data model and writes the desired format.

There are already some blog entries, videos and cheatsheets on this topic. These are listed below the links. In the following I present all commands that I have often used and show some features that I find exciting. On the one hand as a memory for myself and in the hope that others can also profit from it.

Virtual Raster

Raster data can quickly become very large. For this reason, large areas are often divided into several individual raster files. With the program gdalbuildvrt you can create a virtual raster which references all raster files contained in it. This is simply a text file and can be further processed like a normal raster file. This command creates a virtual raster from a folder of GeoTIFF files:

gdalbuildvrt \
  output.vrt \
  my_folder/*.tif

Raster Pyramids

Displaying large raster files can often be very slow, because in principle every single pixel has to be read and rendered. To speed up this process you can use gdaladdo (“GDAL add overview” ) to create overviews for lower zoom levels (also called pyramids). These can then be used directly by the displaying programs and the visualization is much faster. The original file becomes bigger because the additional overviews are saved in the file.

gdaladdo \
  -r average \
  input.tif

for older GDAL versions:

gdaladdo \
  -r average \
  input.tif \
  2 4 8 16
/img/gdal_ogr/build_pyramids.png

Reproject Raster

With gdalwarp you can project grids in different ways. In the documentation very detailed examples are described. So far I have only used this command:

gdalwarp \
  -t_srs "EPSG:4326" \
  input.tif \
  output.tif

MrSID Files

Often you get raster files in MrSID format. On Linux it seems to be possible to read them. But it is easier if you access MrSID with Windows. Then you can convert the file for example into a GeoTIFF and continue working on Linux.

Tiles

The program gdal2tiles.py converts a raster file into a folder full of tiles according to the TMS (Tile Map Service) standard. This is very similar to the XYZ standard. For TMS the tile number (0,0) is bottom left, for XYZ it is top left.

gdal2tiles.py \
  --zoom=2-5 \
  input.tif \
  output_folder

The folder structure then looks like this. For the sake of clarity, only zoom level 2 (and not 3, 4 and 5) is included:

output_folder
β”œβ”€β”€ 2
β”‚Β Β  β”œβ”€β”€ 0
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2.png
β”‚Β Β  β”‚Β Β  └── 3.png
β”‚Β Β  β”œβ”€β”€ 1
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2.png
β”‚Β Β  β”‚Β Β  └── 3.png
β”‚Β Β  β”œβ”€β”€ 2
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 0.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 1.png
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ 2.png
β”‚Β Β  β”‚Β Β  └── 3.png
β”‚Β Β  └── 3
β”‚Β Β      β”œβ”€β”€ 0.png
β”‚Β Β      β”œβ”€β”€ 1.png
β”‚Β Β      β”œβ”€β”€ 2.png
β”‚Β Β      └── 3.png
β”œβ”€β”€ googlemaps.html
β”œβ”€β”€ leaflet.html
β”œβ”€β”€ openlayers.html
└── tilemapresource.xml

5 directories, 20 files

The target folder contains not only tiles but also HTML-based clients with Leaflet and OpenLayers. The tiles can then be placed in a web server (e.g. Apache HTTP Server) and called with any client. A typical URL will look like this: http://localhost/my_tiled_raster/{z}/{x}/{y}.png. For some clients like QGIS or OpenLayers the Y-digit has to be inverted, so: http://localhost/my_tiled_raster/{z}/{x}/{-y}.png (more info can be found in this QGIS issue ).

/img/gdal_ogr/gdal2tiles_leaflet_viewer.png

Leaflet based viewer for map tiles

ogrinfo

ogrinfo returns information about a spatial data source. In this example the GeoPackage of NaturalEarth is queried:

ogrinfo \
  natural_earth_vector.gpkg   

returns:

INFO: Open of `natural_earth_vector.gpkg'
      using driver `GPKG' successful.
1: ne_10m_admin_0_antarctic_claim_limit_lines (Line String)
2: ne_10m_admin_0_antarctic_claims (Polygon)
3: ne_10m_admin_0_boundary_lines_disputed_areas (Line String)
4: ne_10m_admin_0_boundary_lines_land (Line String)
5: ne_10m_admin_0_boundary_lines_map_units (Line String)
...

This is how you get information about a layer. The option -so stands for “summary only”, This causes that ogr2ogr doesn’t show every single feature:

ogrinfo \
  -so \
  natural_earth_vector.gpkg \
  ne_10m_admin_0_antarctic_claim_limit_lines

This returns important information about the layer like number of features, extent or projection.

INFO: Open of `natural_earth_vector.gpkg'
      using driver `GPKG' successful.

Layer name: ne_10m_admin_0_antarctic_claim_limit_lines
Geometry: Line String
Feature Count: 23
Extent: (-150.000000, -90.000000) - (160.100000, -60.000000)
Layer SRS WKT:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
FID Column = fid
Geometry Column = geom
type: String (15.0)
scalerank: Integer (0.0)
featurecla: String (50.0)

ogr2ogr

The tool ogr2ogr can read, process and convert vector data into other formats. This will be explained in the following mainly using the example of GeoPackage. This is a promising candidate to replace the Shapefile as an exchange format for geodata.

ogr2ogr commands can quickly look very confusing, but are basically built according to the same pattern.

A simple copy:

ogr2ogr \
  -f GPKG output.gpkg \
  input.gpkg

Conversion of the projection to EPSG:3857.

ogr2ogr \
  -t_srs EPSG:3857 \
  -f GPKG output.gpkg \
  input.gpkg

However, the projection of the input data set must be known. Otherwise it can be specified with -s_srs:

ogr2ogr \
  -s_srs EPSG:4326 \
  -t_srs EPSG:3857 \
  -f GPKG output.gpkg \
  input.gpkg

GPX

GPS or GNSS devices usually provide their data as GPX files. This file can be opened in QGIS, but it always requires some effort. The conversion into a GeoPackage is as follows:

ogr2ogr \
  -f GPKG output.gpkg \
  input.gpx

In the GeoPackage you can find the following layers:

ogrinfo output.gpkg

returns

INFO: Open of `output.gpkg'
      using driver `GPKG' successful.
1: tracks (Multi Line String)
2: track_points (Point)
3: waypoints (Point)
4: routes (Line String)
5: route_points (Point)

If you need this operation more often, you can create a function from it:

function gpx2gpkg {
  input_gpx="$1"
  filename_without_suffix=${input_gpx%.gpx}
  ogr2ogr -f GPKG "$filename_without_suffix".gpkg $input_gpx
}

CSV

Very often you get data in the CSV-format. These can easily be converted into any other format. With the option -oo (“output options”) many settings can be made. In the example below the column names of the coordinates are given. It is important to assign the corresponding coordinate system of the data set using -a_srs.

ogr2ogr \
  -f GPKG output.gpkg \
  input.csv \
  -oo X_POSSIBLE_NAMES=longitude \
  -oo Y_POSSIBLE_NAMES=latitude \
  -a_srs 'EPSG:4326'

The newly created layer is extended by a geometry column. The original columns with the coordinates are retained. If you don’t want this you can prevent it with the option -oo KEEP_GEOM_COLUMNS=NO.

PostGIS

PostGIS is the spatial extension of the PostgreSQL database and is excellent for storing geodata (link to driver).

Export a table:

ogr2ogr \
  -f GPKG output.gpkg \
  PG:dbname="my_database" "my_table"

Export many tables:

ogr2ogr \
  -f GPKG output.gpkg \
  PG:'dbname=my_database tables=table_1,table_3'

Export the whole database to a GeoPackage:

ogr2ogr \
  -f GPKG output.gpkg \
  PG:dbname=my_database

Export the whole database to a folder of Shapefiles:

ogr2ogr \
  -f "ESRI Shapefile" output_folder \
  PG:dbname=my_database

Load data into PostGIS:

ogr2ogr \
  -f "PostgreSQL" PG:dbname="my_database" \
  input.gpkg \
  -nln "name_of_new_table"

During import, the names of the tables and columns are converted to lowercase. However, you can use the -lco LAUNDER=NO option to keep uppercase letters. However, it is recommended to write everything in lower case letters (unless you have good reasons against it).

ogr2ogr \
  -f "PostgreSQL" PG:dbname="my_database" \
  input.gpkg \
  -lco LAUNDER=NO

When reading multi-geometries the following error might occur: ERROR: Geometry type (MultiPolygon) does not match column type (Polygon) This can be solved with the option -nlt PROMOTE_TO_MULTI:

ogr2ogr \
  -f "PostgreSQL" PG:dbname="my_database" \
  input.gpkg \
  -nlt PROMOTE_TO_MULTI

If you have a whole folder full of GeoPackages (or Shapefiles, GeoJSONs, … ). You can load them into PostGIS with a for loop.

for file in *.gpkg;
  do ogr2ogr \
  -f "PostgreSQL" PG:dbname="my_database" \
  $file;
done

Existing tables can be overwritten with the option -lco OVERWRITE=YES:

ogr2ogr \
  -f "PostgreSQL" PG:dbname="my_database" \
  -lco OVERWRITE=YES \
  input.gpkg

It is even possible to write a Postgres Dump without even having PostgreSQL database.

Query and Filter Data

ogr2ogr can also filter and edit geodata. A common application is the limitation of a data set to a certain area. This can be created with the option -spat:

ogr2ogr \
  -spat -13.931 34.886 46.23 74.12 \
  -f GPKG output.gpkg \
  natural_earth_vector.gpkg

If you only want certain features of a layer, you can define them with -where. This example only returns countries with less than 1 million inhabitants:

ogr2ogr \
  -where "\"POP_EST\" < 1000000" \
  -f GPKG output.gpkg \
  natural_earth_vector.gpkg \
  ne_10m_admin_0_countries

The conditions following the -where must be in ". If there are more " in it, however, these must be escaped with a \. This is the result on the map:

/img/gdal_ogr/countries_pop_less_one_million.png

The two options -spat and -where can of course be combined. This then returns the countries in Europe under 1 million inhabitants:

ogr2ogr \
  -where "\"POP_EST\" < 1000000" \
  -spat -13.931 34.886 46.23 74.12 \
  -f GPKG output.gpkg \
  natural_earth_vector.gpkg \
  ne_10m_admin_0_countries

With the option -sql you can make queries using SQL. Although SQL is actually intended for databases, you can use ogr2ogr to apply it to any type of dataset.

The following example uses this CSV file as input:

ID,Name,latitude,longitude
1,Salzburg, 47.8, 13.0
2,Heidelberg, 49.4, 8.7
3,Trondheim, 63.4, 10.4

With this command only cities with a latitude of less than 50Β° are displayed:

ogr2ogr \
-f "GPKG" output.gpkg \
cities.csv \
-sql "SELECT * FROM cities WHERE latitude < 50" \
-oo X_POSSIBLE_NAMES=longitude \
-oo Y_POSSIBLE_NAMES=latitude \
-a_srs 'EPSG:4326'

There is not the “one” SQL, but there are different dialects. ogr2ogr can always take the dialect of the corresponding input data source. For example Postgres-SQL if it is a Postgres/PostGIS database. Alternatively also its own OGR SQL or SQLite SQL. This can be set with the option -dialect. A special feature of SQLite SQL is that you can even use it to geocode (or reverse-geocode) data (https://gdal.org/user/sql_sqlite_dialect.html#ogr-geocoding-functions).

With SQL you can create geometries from coordinates. In the following example, a non-spatial SQLite table is converted into a spatial GeoPackage. The function MakePoint() creates a valid geometry from the coordinates.

ogr2ogr \
  -f "GPKG" output.gpkg \
  input.sqlite \
  -sql \
  "SELECT
     *,
     MakePoint(longitude, latitude, 4326) AS geometry
   FROM
     my_table" \
  -nln "location" \
  -s_srs "EPSG:4326"

More Functions

With -append you can merge different layers. However, these layers must have the same structure. The following example takes all GeoPackages in one folder and merges them:

for geopackage in *gpkg; do
  ogr2ogr \
  -f "GPKG" combined.gpkg \
  -append \
  $geopackage
done

With the option -progress you can see how much percent of the conversion has already been done. This makes sense especially for operations that take a long time.

The –debug ON option displays a lot of additional information when the command is executed. This is especially useful if the command does not work and you want to find the error.

Python

There is a gdal package for Python. However, the syntax of the functions is very close to the C++ API and therefore rather difficult. For raster data there is the wrapper rasterio. For vector data there is the wrapper fiona. The documentation describes for which cases fiona is suitable.

You can also access ogr2ogr from the command line inside Python. To simplify this, you can use the script ogr2ogr.py (link). This is called as follows:

import ogr2ogr

ogr2ogr.main([
  'ogr2ogr',
  '-f', 'GPKG', 'output.gpkg' ,
  'input.gpkg'
])

Thanks to the Developers

As already mentioned GDAL/OGR is free and OpenSource. At this point I would like to thank Frank Warmerdam who started the project and Even Rouault who meanwhile became the main developer. And of course the many other contriubutors who are developing the project further and keep it running.

CheatSheets

This post was originally written in German. It was translated with www.DeepL.com/Translator and manually revised by the author himself.