11.3.2020

PostgreSQL revisited

I still remember when I heard PostgreSQL for the first time. It was around 2001, when my former room mate mentioned that he was a maintainer for PostgreSQL on Debian. At that time, I was still studying electrical engineering, so database systems were no concern to me. But I knew something was there and even used PostgreSQL for some projects, professional and private. Today I held a presentation for my fellow developers at wer denkt was, comparing PostgreSQL with MySQL.

Over the last years, both systems have added lots of functionality and became much closer with respect to features. The Dumper Blog did a comparison of MySQL 8 and PostgreSQL 10 in 2018. While MySQL added common table expressions and window functions and improved JSON support and geospatial features, PostgreSQL introduced logical replication, semi-synchronous replication and declarative partitioning.

In this blog, I will highlight the remaining differences between the two most widely used open-source SQL implementations.

Table inheritance

One thing only PostgreSQL can do is table inheritance, like in object-oriented programming languages.

-- normal city
CREATE TABLE city (
  id SERIAL PRIMARY KEY,
  name text,
  population float,
  area float
);

-- capital, inherits from city table
CREATE TABLE capital (
  country_id REFERENCES country(id)
) INHERITS (city);

Materialized views

A view in SQL is a saved query which can be used like a table.

-- normal view
CREATE VIEW all_city_view AS SELECT * FROM city;

-- materialized view
CREATE MATERIALIZED VIEW all_city_matview AS SELECT * FROM city;

-- refresh materialized view
REFRESH MATERIALIZED VIEW all_city_matview;

Full outer joins

A full outer join can also be useful for some special kind of queries. MySQL does not support this, while PostgreSQL does.

-- first table
CREATE TABLE departments (
  department_id serial PRIMARY KEY,
  department_name varchar(255) NOT NULL
);

-- second table
CREATE TABLE employees (
  employee_id serial PRIMARY KEY,
  employee_name varchar(255),
  department_id integer
);

-- add data to tables
INSERT INTO
departments (department_name)
VALUES
('Sales'),
('Marketing'),
('HR'),
('IT'),
('Production');

INSERT INTO
employees (employee_name, department_id)
VALUES
('Bette Nicholson', 1),
('Christian Gable', 1),
('Joe Swank', 2),
('Fred Costner', 3),
('Sandra Kilmer', 4),
('Julia Mcqueen', NULL);

-- perform full outer join
SELECT
  employee_name,
  department_name
FROM employees e
FULL OUTER JOIN departments d ON d.department_id = e.department_id;

Data types

One major difference is the naming of data types. To convert safely, refer to MySQL to PostgreSQL Types Mapping. This is a major obstacle when porting SQL scripts, better use a tool like pgloader which has built-in conversion logic for a variety of input formats.

Geospatial data

This is a crucial functionality for the AppJobber application, which allows people to see all available micro jobs in their selected map range. So this runs down to selecting entries from a large table within a bounding box. While MySQL has built-in support for geospatial data types, PostgreSQL uses the PostGIS extension. The Boston Geographic Information Systems did a detailed comparison of geospatial features between MS SQL Server, MySQL and PostgreSQL/PostGIS, which seems a bit outdated.

So I was running a sample bounding-box query on all available jobs, using a bounding box around Darmstadt. It took MySQL roughly a second to complete the query, PostgreSQL did it in less then 400 ms. However, the two queries were a bit different, as MySQL had location data readily stored as blob and PostgreSQL used a converted geometric point type.

Performance

There is ample discussion about performance, most favoring MySQL for read-heavy loads and PostgreSQL for read/write-heavy applications. So this depends very much on the application in question.

Conclusion

After all, I hoped to show that PostgreSQL is a viable alternative to MySQL with more advanced features.