Update an OSM database with osm2pgsql
last updated 2023-01-28
With osm2pgsql it is possible to load data from OpenStreetMap into a PostGIS database. The data can then be processed and analyzed with the full power of spatial SQL. Programs like QGIS, GeoServer or MapServer can read that database and render it on a map. A very impressive feature of osm2pgsql is to keep the database up-to-date with the latest changes from OpenStreetMap. This is done by the commandline tool osm2pgsql-replication (see docs). It checks from which source your initial data comes from and downloads the respective changesets.
Scripts
This scripts below (Python or shell version) can be used to init the database and to update it regularly. The examples use the extract from Germany, Austria and Switzerland (also known als “DACH”).
Shell version
#!/bin/bash
export PGHOST=localhost
export PGPORT=5432
export PGUSER=postgres
export PGPASSWORD=postgres
export PGDATABASE=osm
DB_SCHEMA=dach
PREFIX=dach
OSM_FILE=/home/jakob/osm-data/dach.osm.pbf
STYLE_FILE=/home/jakob/osm-data/flex-config/dach.lua
psql -c "CREATE SCHEMA IF NOT EXISTS ${DB_SCHEMA}";
EXTRA_ATTRIBUTES="\
  --style ${STYLE_FILE}\
  --output flex \
  --middle-schema=${DB_SCHEMA}"
# INIT THE DATABASE AND THE REPLICATION
# THIS SHOULD BE DONE ONLY ONCE, AFTERWARDS THESE COMMANDS CAN BE COMMENTED (DEACTIVATED)
# osm2pgsql \
# --create \
# --slim \
# --verbose \
# --prefix=${PREFIX} \
# ${EXTRA_ATTRIBUTES} \
# ${OSM_FILE}
# osm2pgsql-replication init \
#  --verbose \
#  --prefix=${PREFIX} \
#  --osm-file=${OSM_FILE}
# Download the latest OSM data and load it into PostGIS
# SHOULD BE RUN REGULARLY
osm2pgsql-replication update \
  --verbose \
  --prefix=${PREFIX} -- ${EXTRA_ATTRIBUTES}
Python version
#!/usr/bin/env python3
import subprocess
import os
import argparse
try:
    import psycopg2 as psycopg
    from psycopg2 import sql
except ImportError:
    import psycopg
    from psycopg import sql
def main():
    parser = argparse.ArgumentParser(
        description="A tool to create and update an OpenStreetMap database using osm2pgsql.",
        epilog="""
        The PostgreSQL connection must be defined with environment variables:
        https://www.postgresql.org/docs/current/libpq-envars.html.
        """,
    )
    parser.add_argument(
        "--osm-file",
        type=str,
        help="path to osm file. Only needed for creation of the OSM database.",
    )
    parser.add_argument(
        "--style-file", type=str, help="path to the Lua style file", required=True
    )
    parser.add_argument(
        "--prefix",
        type=str,
        help="prefix of the created database tables",
        required=True,
    )
    parser.add_argument(
        "--status-table-schema",
        type=str,
        help="database schema to store the status table in",
        required=True,
    )
    args = parser.parse_args()
    create_and_update_osm_db(
        args.osm_file, args.style_file, args.prefix, args.status_table_schema
    )
def create_and_update_osm_db(osm_file, style_file, prefix, schema_status_table):
    """
    Create and update an OpenStreetMap database using the osm2pgsql command-line utility.
    Parameters:
    - osm_file (str): The path to the OSM data file.
    - style_file (str): The path to the style file.
    - prefix (str): The prefix to use for the database tables.
    - schema_status_table (str): The database schema to store the status table in
    """
    ensure_file_exists(osm_file, "OSM file")
    # check input
    ensure_file_exists(style_file, "style file")
    # db connection
    conn = psycopg.connect("")
    # create schema if not present
    with conn.cursor() as cur:
        cur.execute(
            sql.SQL("CREATE SCHEMA IF NOT EXISTS {db_schema}").format(
                db_schema=sql.Identifier(schema_status_table)
            )
        )
    conn.commit()
    extra_attributes = [
        "--style={style_file}".format(style_file=style_file),
        "--output=flex",
        "--middle-schema={db_schema}".format(db_schema=schema_status_table),
    ]
    status_table_name = "{prefix}_replication_status".format(prefix=prefix)
    # execute subprocess
    if table_exists(conn, status_table_name, schema_status_table):
        # update existing OSM tables
        update_command = [
            "osm2pgsql-replication",
            "update",
            "--middle-schema={db_schema}".format(db_schema=schema_status_table),
            "--verbose",
            "--prefix={prefix}".format(prefix=prefix),
            "--",
        ] + extra_attributes
        subprocess.run(update_command)
    else:
        # OSM database does not exist yet
        # we will create it
        ensure_file_exists(osm_file, "OSM file")
        create_command = (
            [
                "osm2pgsql",
                "--create",
                "--slim",
                "--verbose",
                "--prefix={prefix}".format(prefix=prefix),
            ]
            + extra_attributes
            + [osm_file]
        )
        subprocess.run(create_command)
        init_command = [
            "osm2pgsql-replication",
            "init",
            "--verbose",
            "--middle-schema={db_schema}".format(db_schema=schema_status_table),
            "--prefix={prefix}".format(prefix=prefix),
            "--osm-file={osm_file}".format(osm_file=osm_file),
        ]
        subprocess.run(init_command)
def ensure_file_exists(file_path, display_name):
    """
    Checks if a file exists in the file system and exists otherwise with a warning.
    Parameters:
       file_path (str): The path to the file
       display_name (str): The name of the file for displaying in the warning message
    """
    file_exists = bool(file_path) and os.path.exists(file_path)
    if not file_exists:
        print("{} does not exist at path: {}".format(display_name, file_path))
        exit(1)
def table_exists(conn, table_name, schema_name=None):
    """
    Check if a table with the specified name exists in a PostgreSQL database.
    Taken from https://github.com/openstreetmap/osm2pgsql/blob/80835ba/scripts/osm2pgsql-replication
    Parameters:
        conn (psycopg2.extensions.connection): A connection to a PostgreSQL database.
        table_name (str): The name of the table to check for.
        schema_name (str, optional): The name of the schema to search for the table in. If not provided, the table is searched for in the public schema.
    Returns:
        bool: True if the table exists, False otherwise.
    """
    with conn.cursor() as cur:
        if schema_name is not None:
            cur.execute(
                "SELECT * FROM pg_tables where \
                    tablename = %s and schemaname = %s ",
                (table_name, schema_name),
            )
        else:
            cur.execute("SELECT * FROM pg_tables where tablename = %s", (table_name,))
        return cur.rowcount > 0
if __name__ == "__main__":
    main()
This shell script calls the Python script:
#!/bin/bash
# ensure env vars are forwarded to sub processes
set -a
source /home/jakob/scripts/env-vars.sh
set +a
/home/jakob/scripts/create_and_update_osm_db.py \
    --osm-file /home/jakob/osm-data/dach.osm.pbf \
    --style-file /home/jakob/osm-data/flex-config/dach.lua \
    --prefix dach \
    --status-table-schema dach
Data Download and Structuring
The OpenStreetMap extract can be downloaded from providers like GeoFabrik or openstreetmap.fr.
The import requires a Lua script to define the structure. It is adapted from a Lua script example from the osm2pgsql source code repository. In a real world scenario you would probably use a more detailed Lua script with more tables and settings.
IMPORTANT: Make sure the schema name (in this example dach) matches in the script above and in the Lua file.
local dtable = osm2pgsql.define_way_table('data', {
        { column = 'tags',  type = 'jsonb' },
        { column = 'geom',  type = 'linestring' },
    }, { schema = 'dach' })
function osm2pgsql.process_way(object)
    dtable:insert({
        tags = object.tags,
        geom = object:as_linestring()
    })
end
Automation
In my setup I use a desktop computer with Debian to store my data. If the computer is switched on permanently the update script can be run regularly using cron. In my case the computer is only switched on from time to time. For this scenario anacron is the better fit. It ensures the script is run within a specific time period. On Debian/Ubuntu-based systems it can be installed with apt install anacron.
First make your update script executable with chmod +x /path/to/script. Then, the update script needs to be referenced in the file /etc/anacrontab. In my case it looks like this:
# /etc/anacrontab: configuration file for anacron
# See anacron(8) and anacrontab(5) for details.
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
HOME=/root
LOGNAME=root
# These replace cron's entries
1       5       cron.daily      run-parts --report /etc/cron.daily
7       10      cron.weekly     run-parts --report /etc/cron.weekly
@monthly        15      cron.monthly    run-parts --report /etc/cron.monthly
1 5       dach_osm /home/jakob/scripts/osm2pgsql_dach.sh >> /home/jakob/logs/osm2pgsql_dach.log 2>&1
This configuration runs the script once per day. After booting it waits for 5 minutes to actually start the script.