Monitoring WAL lag in PostgreSQL 9.x

Monitoring WAL lag in PostgreSQL 9.x

When using streaming replication in PostgreSQL 9, it's important to know what the latency is between the master and slaves, especially when deploying on cloud based instances. Ideally, we'd like to know by how many bytes the WAL logs are lagging.  PostgreSQL offers a neat way to check just that between a given slave and its master via the pg_current_xlog_location() and pg_last_xlog_replay_location() functions.  However, the output seems cryptic.

postgres=# select * from pg_current_xlog_location();
(1 row)

OK, so what does that mean?  Looking at the WAL files in the pg_xlog/ directory, we see this file that appears to be related.


Since WAL files are fixed chunks, we'd expect that the internal xlog pointer will have a byte level position in some future WAL file that has yet to be written. We'd like to decipher the output of the functions and determine what the current byte is.

One of the great things about using open source is that you can quickly drill down into the source code and find out why things are the way they are.  PostgreSQL has converted to using GIT, so let's check it out:

git clone git://

After searching a bit, we can find the relevant piece of code in bufpage.h, which defines a struct responsible for tracking the current byte position of the PostgreSQL log,

This tells us that the output format of these functions is "${xlogid}/${xrecoff}" and that (xlogid << 32 | xrecoff) is the 64 bit number that represents the current byte position of the xlog.

From here, it's straightforward to create something that can poll both functions on master and slave, takes the difference, and submits it to some monitoring system.

Here's a simple ruby script that does this and posts the delta to graphite.




  1. Actually, it looks like xlogid is actually multiplied by 0xff00000, due to a 4080 MB limit per xlog file. See:


  2. It seems your ruby program has a bug where it says:
    xlog_location.gsub("/", "").to_i(16)

    The problem is, if the lower part has less than 8 digits, you end up incorrectly concatenating the upper digits to the lower ones with out zero padding:
    AB/123456 gets turned into AB123456
    But it should be turned into AB00123456

    So what you need is to split the string on "/", and then convert each part separately from hex string to a number, then combine the numbers with (upper << 32) | lower


Leave a Reply

Your email address will not be published. Required fields are marked *