PostgreSQL Job Logging & Monitor Extension – The Logger at Keith's Ramblings…

PostgreSQL Job Logging & Monitor Extension – The Logger

with one comment

Update 2012/06/05: So of course after I go publishing this blog post, I discover a major shortcoming in 0.3.0. It requires super-user privileges to actually run. Teach me to test with only my own account in the database. I’ll be pushing 0.3.1 out shortly with a fix to allow non-superusers to be able to use these logging functions. Just requires a little bit of configuration, which I’ve added to that section below and the README.md file.

Job logging and monitoring in PostgreSQL can be an invaluable tool for tracking down issues when important database tasks are automated. OmniTI has some great job logging systems in place for the databases we support, but there was no consistent monitoring of this logging that was specifically built with it in mind. I thought this would be a great opportunity to take advantage of PostgreSQL 9.1′s new extensions capability to try and bring a consistent logging and monitoring tool into use that would be easily maintained and updatable. This blog post will first concentrate on the logging portion. I’ll do a follow up post about the monitor hopefully next month.

https://github.com/omniti-labs/pg_jobmon

The first, and easier piece to pull into an extension format was the logging. Easier mostly because previous developers here had done a good job of getting logging working. Since Postgres’s functions aren’t autonomous, true job logging can be difficult. If a function fails to run, everything that it was doing, including inserts to a logging table, would be rolled back. Using dblink to have Postgres connect back to itself is a neat trick to get autonomous functions working and have intermittent commits actually be persistent despite a job failure. The details of doing this aren’t really the main point of this blog post, though, so if you’d like to see the details on how this works, the source code of the extension is freely available.

Installation

First off, the new extension system makes the install (and future updates) a breeze. I set the extension up to be compatible with the PostgreSQL Extension Network so you can use make/make install to get the files in the correct location. It is available on pgxn, but it is marked testing, so it won’t show up in searches yet. Would like to get some feedback from others before making it v1.0 stable – pgxn.org/dist/pg_jobmon. After that, it’s just a matter of using the new 9.1 extension management commands. I’d recommend creating a schema instead of installing it to public. I’m assuming the schema jobmon in all examples, but you can choose whichever one you want. Note that the schema cannot be changed after install without reinstalling the extension (it is not relocatable). This extension has table data as well, so re-installation can possibly result in the loss of configuration and logging data without taking precautions.

CREATE SCHEMA jobmon;
CREATE EXTENSION pg_jobmon SCHEMA jobmon;

1 2	CREATE SCHEMA jobmon; CREATE EXTENSION pg_jobmon SCHEMA jobmon;

How to Log Steps

pg_jobmon has some pretty basic functions for doing job logging so getting it up and working isn’t very difficult at all. To start off, some variables to hold job and step ids will need to be defined for any function that requires logging

v_job_id   bigint;
v_step_id  bigint;

1 2	v_job_id bigint; v_step_id bigint;

To start your job logging for a function call the add_job(text) function, storing the ID it generates to a variable. What you enter for JOB NAME will be used in the jobmon.job_log table as the job_name column. All values for the job name are automatically capitalized for consistency and to make searching a little more predictable as to what the values will be. I’ll be using a new, specialized replication extension I’m working on, mimeo, to show examples of how pg_jobmon is used. Specifically, the snap replication which does a full copy of a table from one postgresql instance to another.

 v_job_id := jobmon.add_job('REFRESH SNAP: KEITH.TEST_TABLE');

1	v_job_id := jobmon.add_job('REFRESH SNAP: KEITH.TEST_TABLE');

This creates an entry in the jobmon.job_log table. pg_jobmon has a set of “show” functions that can make reading the job logs easier. I’ll talk more about them at the end, but I’ll be using them in these examples as well.

flpg=# select * from jobmon.show_job('REFRESH SNAP: KEITH.TEST_TABLE');
-[ RECORD 1 ]--------------------------------------
job_id     | 13
owner      | keith
job_name   | REFRESH SNAP: KEITH.TEST_TABLE
start_time | 2012-06-03 18:52:10.686897-04
end_time   | 
status     | 
pid        | 13805

flpg=# select * from jobmon.show_job('REFRESH SNAP: KEITH.TEST_TABLE');

-[ RECORD 1 ]--------------------------------------

job_id | 13

owner | keith

job_name | REFRESH SNAP: KEITH.TEST_TABLE

start_time | 2012-06-03 18:52:10.686897-04

end_time |

status |

pid | 13805

From here on out, job logging is primarily done with two functions: add_step(bigint, text) and update_step(bigint, text, text). add_step() is used to start tracking the progress of each step of the function that you’d like to monitor. The first argument is the job_id generated from add_job(). The text should be a short description of the step. Note that this description stays constant and is not changed as part of the update. add_step() returns a step_id that should be stored.

v_step_id := jobmon.add_step(v_job_id,'Grabbing Mapping, Building SQL');

1	v_step_id := jobmon.add_step(v_job_id,'Grabbing Mapping, Building SQL');

This creates the first step for our job and is logged to the jobmon.job_detail table under the given job_id.

testdb=# select * from jobmon.show_detail(13);
-[ RECORD 1 ]+--------------------------------------------
job_id       | 13
step_id      | 21
action       | Grabbing Mapping, Building SQL
start_time   | 2012-06-03 18:52:10.722388-04
end_time     | 
elapsed_time | 
status       | 
message      |

testdb=# select * from jobmon.show_detail(13);

-[ RECORD 1 ]+--------------------------------------------

job_id | 13

step_id | 21

action | Grabbing Mapping, Building SQL

start_time | 2012-06-03 18:52:10.722388-04

end_time |

elapsed_time |

status |

message |

Later in the function, after this step should have been completed, we update that step with the current status using update_step(). The first argument is the step_id that is to be updated. The next argument is a simple status. in this case everything should be ok if it reached this point, so it’s set to ‘OK’. (You can set whatever status message you wish here. Would recommend being consistent with the config table mentioned later.) The final argument is a message that can give more details as to results of this step. This was a fairly simple step, so just saying ‘done’ is good enough. You can update a status or message several times for a single step if needed, showing the current progress of longer running steps.

PERFORM jobmon.update_step(v_step_id, 'OK','Done');

-[ RECORD 1 ]+--------------------------------------------
job_id       | 13
step_id      | 21
action       | Grabbing Mapping, Building SQL
start_time   | 2012-06-03 18:52:10.722388-04
end_time     | 2012-06-03 18:52:10.791287-04
elapsed_time | 0.068899
status       | OK
message      | Done

PERFORM jobmon.update_step(v_step_id, 'OK','Done');

-[ RECORD 1 ]+--------------------------------------------

job_id | 13

step_id | 21

action | Grabbing Mapping, Building SQL

start_time | 2012-06-03 18:52:10.722388-04

end_time | 2012-06-03 18:52:10.791287-04

elapsed_time | 0.068899

status | OK

message | Done

When you’re done with this step and want to move on to the next, just call the add_step() function again to create a new step ID.

v_step_id := jobmon.add_step(v_job_id,'Inserting records into local table');

1	v_step_id := jobmon.add_step(v_job_id,'Inserting records into local table');

For this step we’ll provide a little more information to the message log. I’ve added a GET DIAGNOSTICS call to the portion where the INSERT is done to get the row count of this snap job. This is added to the message log so we can make sure rows were actually copied into the snap table.

EXECUTE v_insert_sql; 
GET DIAGNOSTICS v_rowcount = ROW_COUNT;
PERFORM jobmon.update_step(v_step_id, 'OK','Inserted '||v_rowcount||' records');

EXECUTE v_insert_sql;

GET DIAGNOSTICS v_rowcount = ROW_COUNT;

PERFORM jobmon.update_step(v_step_id, 'OK','Inserted '||v_rowcount||' records');