Skip to main content

Fun with Postgresql and ZFS

I will show how to use ZFS instant snapshotting and cloning functionality to effortlessly clone a running postgres database regardless of its size.

Setup

Install your Linux OS of choice then ZFS and Postgres. I use Centos 7 but most commands used in this post are distro-indipendent.

Create a zfs pool called tank or use whatever name suits you. In the pool create a filesystem called pgdata. For the sake of following a minimalist ZFS best practice apply the following settings:

zfs set compression=lz4 tank/pgdata
zfs set xattr=sa tank/pgdata



Now move the postgresql directory to /tank/pgdata and symlink its original location to the new one. If you have selinux enabled and are in a hurry set it to permissive or just disable it. Otherwise relabel the new location under /tank/pgdata.

In the case of Centos7 I would have initialized the database with postgresql-setup initdb then moved /var/lib/pgsql to /tank/pgdata/pgsql.

For the purpose of this post it is not necessary to configure postgres to enable archive logs. If you do, and you should, everything will work the same.

Start postgres using the command available to your system. On Centos7 it is:

systemctl start postgres

Let's load some data (around 5 million rows) with pgbench (run psql and pgbench as the postgres user):

psql -c "create database sample"
pgbench -i -s 50 -n

While pgbench is running take a snaphost of the pgdata filesystem:

zfs snapshot tank/pgdata@duringpgbench

If pgbench finishes before you can take the snapshot simply run it again. If pgbench takes a long time you should try and take another snapshot or as many as you want. Notice how the snapshot is instantaneous and does not affect pgbench at all.
In real life you would probably want to use the ZFS autosnapshot tool to manage snapshots for you.

Now, suppose we need to inspect the database at a certain point in time, like while pgbench was running. All we need to do is:
  1. find the snapshot
  2. clone it
  3. run postgres from the clone
Note that the clone will not use any more space than that needed by postgres to recover the database. An unwritten to clone will use zero disk space, regardless of the filesystem size and usage.

In our case the snapshot is called tank/pgdata@duringpgbench and can be cloned with the following command:

zfs clone tank/pgdata@duringpgbench tank/duringpgbench_clone

We can now run postgres on the clone (remove the pid file or postgres will refuse to start):

cd /tank/duringpgbench_clone/pgsql
rm data/*.pid
postgres -D data -p 1234

We have now postgres running on port 1234 and we can login using psql or whatever sql tool.

When we are done it is sufficient to CTRL^C the postgres process and destroy the clone:

zfs destroy tank/duringpgbench_clone

Before destroy the clone check your ZFS filesystems (zfs list) and marvel at how little space they are using. If you are so inclined also check compression factor.

Comments

Popular posts from this blog

Mirth: recover space when mirthdb grows out of control

I was recently asked to recover a mirth instance whose embedded database had grown to fill all available space so this is just a note-to-self kind of post. Btw: the recovery, depending on db size and disk speed, is going to take long. The problem A 1.8 Mirth Connect instance was started, then forgotten (well neglected, actually). The user also forgot to setup pruning so the messages filled the embedded Derby database until it grew to fill all the available space on the disk. The SO is linux. The solution First of all: free some disk space so that the database can be started in embedded mode from the cli. You can also copy the whole mirth install to another server if you cannot free space. Depending on db size you will need a corresponding amount of space: in my case a 5GB db required around 2GB to start, process logs and then store the temp files during shrinking. Then open a shell as the user that mirth runs as (you're not running it as root, are you?) and cd in

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive. Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

How to automatically import a ZFS pool built on top of iSCSI devices with systemd

When using ZFS on top of iSCSI devices one needs to deal with the fact that iSCSI devices usually appear late in the boot process. ZFS on the other hand is loaded early and the iSCSI devices are not present at the time ZFS scans available devices for pools to import. This means that not all ZFS pools might be imported after the system has completed boot, even if the underlying devices are present and functional. A quick and dirty solution would be to run  zpool import <poolname> after boot, either manually or from cron. A better, more elegant solution is instead to hook into systemd events and trigger zpool import as soon as the devices are created.