Chapter 10. Handling repository events with hooks

Chapter 10. Handling repository events with hooks
Prev		Next

Table of Contents

An overview of hooks in Mercurial

Hooks and security

Hooks are run with your privileges
Hooks do not propagate
Hooks can be overridden
Ensuring that critical hooks are run

A short tutorial on using hooks

Performing multiple actions per event
Controlling whether an activity can proceed

Writing your own hooks

Choosing how your hook should run
Hook parameters
Hook return values and activity control
Writing an external hook
Telling Mercurial to use an in-process hook
Writing an in-process hook

Some hook examples

Writing meaningful commit messages
Checking for trailing whitespace

Bundled hooks

acl—access control for parts of a repository

Configuring the acl hook
Testing and troubleshooting

bugzilla—integration with Bugzilla

Configuring the bugzilla hook
Mapping committer names to Bugzilla user names
Configuring the text that gets added to a bug
Testing and troubleshooting

notify—send email notifications

Configuring the notify hook
Testing and troubleshooting

Information for writers of hooks

In-process hook execution

External hook execution

Finding out where changesets come from

Sources of changesets
Where changes are going—remote repository URLs

Hook reference

changegroup—after remote changesets added
commit—after a new changeset is created
incoming—after one remote changeset is added
outgoing—after changesets are propagated
prechangegroup—before starting to add remote changesets
precommit—before starting to commit a changeset
preoutgoing—before starting to propagate changesets
pretag—before tagging a changeset
pretxnchangegroup—before completing addition of remote changesets
pretxncommit—before completing commit of new changeset
preupdate—before updating or merging working directory
tag—after tagging a changeset
update—after updating or merging working directory

Mercurial offers a powerful mechanism to let you perform automated actions in response to events that occur in a repository. In some cases, you can even control Mercurial's response to those events.

The name Mercurial uses for one of these actions is a hook. Hooks are called “triggers” in some revision control systems, but the two names refer to the same idea.

An overview of hooks in Mercurial

Here is a brief list of the hooks that Mercurial supports. We will revisit each of these hooks in more detail later, in the section called “Information for writers of hooks”.

Each of the hooks whose description begins with the word “Controlling” has the ability to determine whether an activity can proceed. If the hook succeeds, the activity may proceed; if it fails, the activity is either not permitted or undone, depending on the hook.

changegroup: This is run after a group of changesets has been brought into the repository from elsewhere.
commit: This is run after a new changeset has been created in the local repository.
incoming: This is run once for each new changeset that is brought into the repository from elsewhere. Notice the difference from changegroup, which is run once per group of changesets brought in.
outgoing: This is run after a group of changesets has been transmitted from this repository.
prechangegroup: This is run before starting to bring a group of changesets into the repository.
precommit: Controlling. This is run before starting a commit.
preoutgoing: Controlling. This is run before starting to transmit a group of changesets from this repository.
pretag: Controlling. This is run before creating a tag.
pretxnchangegroup: Controlling. This is run after a group of changesets has been brought into the local repository from another, but before the transaction completes that will make the changes permanent in the repository.
pretxncommit: Controlling. This is run after a new changeset has been created in the local repository, but before the transaction completes that will make it permanent.
preupdate: Controlling. This is run before starting an update or merge of the working directory.
tag: This is run after a tag is created.
update: This is run after an update or merge of the working directory has finished.

Hooks and security

Hooks are run with your privileges

When you run a Mercurial command in a repository, and the command causes a hook to run, that hook runs on your system, under your user account, with your privilege level. Since hooks are arbitrary pieces of executable code, you should treat them with an appropriate level of suspicion. Do not install a hook unless you are confident that you know who created it and what it does.

In some cases, you may be exposed to hooks that you did not install yourself. If you work with Mercurial on an unfamiliar system, Mercurial will run hooks defined in that system's global ~/.hgrc file.

If you are working with a repository owned by another user, Mercurial can run hooks defined in that user's repository, but it will still run them as “you”. For example, if you hg pull from that repository, and its .hg/hgrc defines a local outgoing hook, that hook will run under your user account, even though you don't own that repository.

	Note
	This only applies if you are pulling from a repository on a local or network filesystem. If you're pulling over http or ssh, any `outgoing` hook will run under whatever account is executing the server process, on the server.

To see what hooks are defined in a repository, use the hg showconfig hooks command. If you are working in one repository, but talking to another that you do not own (e.g. using hg pull or hg incoming), remember that it is the other repository's hooks you should be checking, not your own.

Hooks do not propagate

In Mercurial, hooks are not revision controlled, and do not propagate when you clone, or pull from, a repository. The reason for this is simple: a hook is a completely arbitrary piece of executable code. It runs under your user identity, with your privilege level, on your machine.

It would be extremely reckless for any distributed revision control system to implement revision-controlled hooks, as this would offer an easily exploitable way to subvert the accounts of users of the revision control system.

Since Mercurial does not propagate hooks, if you are collaborating with other people on a common project, you should not assume that they are using the same Mercurial hooks as you are, or that theirs are correctly configured. You should document the hooks you expect people to use.

In a corporate intranet, this is somewhat easier to control, as you can for example provide a “standard” installation of Mercurial on an NFS filesystem, and use a site-wide ~/.hgrc file to define hooks that all users will see. However, this too has its limits; see below.

Hooks can be overridden

Mercurial allows you to override a hook definition by redefining the hook. You can disable it by setting its value to the empty string, or change its behavior as you wish.

If you deploy a system- or site-wide ~/.hgrc file that defines some hooks, you should thus understand that your users can disable or override those hooks.

Ensuring that critical hooks are run

Sometimes you may want to enforce a policy that you do not want others to be able to work around. For example, you may have a requirement that every changeset must pass a rigorous set of tests. Defining this requirement via a hook in a site-wide ~/.hgrc won't work for remote users on laptops, and of course local users can subvert it at will by overriding the hook.

Instead, you can set up your policies for use of Mercurial so that people are expected to propagate changes through a well-known “canonical” server that you have locked down and configured appropriately.

One way to do this is via a combination of social engineering and technology. Set up a restricted-access account; users can push changes over the network to repositories managed by this account, but they cannot log into the account and run normal shell commands. In this scenario, a user can commit a changeset that contains any old garbage they want.

When someone pushes a changeset to the server that everyone pulls from, the server will test the changeset before it accepts it as permanent, and reject it if it fails to pass the test suite. If people only pull changes from this filtering server, it will serve to ensure that all changes that people pull have been automatically vetted.

A short tutorial on using hooks

It is easy to write a Mercurial hook. Let's start with a hook that runs when you finish a hg commit, and simply prints the hash of the changeset you just created. The hook is called commit.

All hooks follow the pattern in this example.

$ hg init hook-test
$ cd hook-test
$ echo '[hooks]' >> .hg/hgrc
$ echo 'commit = echo committed $HG_NODE' >> .hg/hgrc
$ cat .hg/hgrc
[hooks]
commit = echo committed $HG_NODE
$ echo a > a
$ hg add a
$ hg commit -m 'testing commit hook'
committed 8cc1c7ddb98a9b6d7687a1faa62cf7b32dbe9e03

You add an entry to the hooks section of your ~/.hgrc. On the left is the name of the event to trigger on; on the right is the action to take. As you can see, you can run an arbitrary shell command in a hook. Mercurial passes extra information to the hook using environment variables (look for HG_NODE in the example).

Performing multiple actions per event

Quite often, you will want to define more than one hook for a particular kind of event, as shown below.

$ echo 'commit.when = echo -n "date of commit: "; date' >> .hg/hgrc
$ echo a >> a
$ hg commit -m 'i have two hooks'
committed 669d3ac83eae08ced498b61829441d922eb5cf1d
date of commit: Tue May  5 06:55:36 GMT 2009

Mercurial lets you do this by adding an extension to the end of a hook's name. You extend a hook's name by giving the name of the hook, followed by a full stop (the “.” character), followed by some more text of your choosing. For example, Mercurial will run both commit.foo and commit.bar when the commit event occurs.

To give a well-defined order of execution when there are multiple hooks defined for an event, Mercurial sorts hooks by extension, and executes the hook commands in this sorted order. In the above example, it will execute commit.bar before commit.foo, and commit before both.

It is a good idea to use a somewhat descriptive extension when you define a new hook. This will help you to remember what the hook was for. If the hook fails, you'll get an error message that contains the hook name and extension, so using a descriptive extension could give you an immediate hint as to why the hook failed (see the section called “Controlling whether an activity can proceed” for an example).

Controlling whether an activity can proceed

In our earlier examples, we used the commit hook, which is run after a commit has completed. This is one of several Mercurial hooks that run after an activity finishes. Such hooks have no way of influencing the activity itself.

Mercurial defines a number of events that occur before an activity starts; or after it starts, but before it finishes. Hooks that trigger on these events have the added ability to choose whether the activity can continue, or will abort.

The pretxncommit hook runs after a commit has all but completed. In other words, the metadata representing the changeset has been written out to disk, but the transaction has not yet been allowed to complete. The pretxncommit hook has the ability to decide whether the transaction can complete, or must be rolled back.

If the pretxncommit hook exits with a status code of zero, the transaction is allowed to complete; the commit finishes; and the commit hook is run. If the pretxncommit hook exits with a non-zero status code, the transaction is rolled back; the metadata representing the changeset is erased; and the commit hook is not run.

$ cat check_bug_id
#!/bin/sh
# check that a commit comment mentions a numeric bug id
hg log -r $1 --template {desc} | grep -q "\<bug *[0-9]"
$ echo 'pretxncommit.bug_id_required = ./check_bug_id $HG_NODE' >> .hg/hgrc
$ echo a >> a
$ hg commit -m 'i am not mentioning a bug id'
transaction abort!
rollback completed
abort: pretxncommit.bug_id_required hook exited with status 1
$ hg commit -m 'i refer you to bug 666'
committed 052ec7f13869f36df5932e83e2a24164d8040aab
date of commit: Tue May  5 06:55:36 GMT 2009

The hook in the example above checks that a commit comment contains a bug ID. If it does, the commit can complete. If not, the commit is rolled back.

Writing your own hooks

When you are writing a hook, you might find it useful to run Mercurial either with the -v option, or the verbose config item set to “true”. When you do so, Mercurial will print a message before it calls each hook.

Choosing how your hook should run

You can write a hook either as a normal program—typically a shell script—or as a Python function that is executed within the Mercurial process.

Writing a hook as an external program has the advantage that it requires no knowledge of Mercurial's internals. You can call normal Mercurial commands to get any added information you need. The trade-off is that external hooks are slower than in-process hooks.

An in-process Python hook has complete access to the Mercurial API, and does not “shell out” to another process, so it is inherently faster than an external hook. It is also easier to obtain much of the information that a hook requires by using the Mercurial API than by running Mercurial commands.

If you are comfortable with Python, or require high performance, writing your hooks in Python may be a good choice. However, when you have a straightforward hook to write and you don't need to care about performance (probably the majority of hooks), a shell script is perfectly fine.

Hook parameters

Mercurial calls each hook with a set of well-defined parameters. In Python, a parameter is passed as a keyword argument to your hook function. For an external program, a parameter is passed as an environment variable.

Whether your hook is written in Python or as a shell script, the hook-specific parameter names and values will be the same. A boolean parameter will be represented as a boolean value in Python, but as the number 1 (for “true”) or 0 (for “false”) as an environment variable for an external hook. If a hook parameter is named foo, the keyword argument for a Python hook will also be named foo, while the environment variable for an external hook will be named HG_FOO.

Hook return values and activity control

A hook that executes successfully must exit with a status of zero if external, or return boolean “false” if in-process. Failure is indicated with a non-zero exit status from an external hook, or an in-process hook returning boolean “true”. If an in-process hook raises an exception, the hook is considered to have failed.

For a hook that controls whether an activity can proceed, zero/false means “allow”, while non-zero/true/exception means “deny”.

Writing an external hook

When you define an external hook in your ~/.hgrc and the hook is run, its value is passed to your shell, which interprets it. This means that you can use normal shell constructs in the body of the hook.

An executable hook is always run with its current directory set to a repository's root directory.

Each hook parameter is passed in as an environment variable; the name is upper-cased, and prefixed with the string “HG_”.

With the exception of hook parameters, Mercurial does not set or modify any environment variables when running a hook. This is useful to remember if you are writing a site-wide hook that may be run by a number of different users with differing environment variables set. In multi-user situations, you should not rely on environment variables being set to the values you have in your environment when testing the hook.

Telling Mercurial to use an in-process hook

The ~/.hgrc syntax for defining an in-process hook is slightly different than for an executable hook. The value of the hook must start with the text “python:”, and continue with the fully-qualified name of a callable object to use as the hook's value.

The module in which a hook lives is automatically imported when a hook is run. So long as you have the module name and PYTHONPATH right, it should “just work”.

The following ~/.hgrc example snippet illustrates the syntax and meaning of the notions we just described.

[hooks]
commit.example = python:mymodule.submodule.myhook

When Mercurial runs the commit.example hook, it imports mymodule.submodule, looks for the callable object named myhook, and calls it.

Writing an in-process hook

The simplest in-process hook does nothing, but illustrates the basic shape of the hook API:

def myhook(ui, repo, **kwargs):
    pass

The first argument to a Python hook is always a ui object. The second is a repository object; at the moment, it is always an instance of localrepository. Following these two arguments are other keyword arguments. Which ones are passed in depends on the hook being called, but a hook can ignore arguments it doesn't care about by dropping them into a keyword argument dict, as with **kwargs above.

Some hook examples

Writing meaningful commit messages

It's hard to imagine a useful commit message being very short. The simple pretxncommit hook of the example below will prevent you from committing a changeset with a message that is less than ten bytes long.

$ cat .hg/hgrc
[hooks]
pretxncommit.msglen = test `hg tip --template {desc} | wc -c` -ge 10
$ echo a > a
$ hg add a
$ hg commit -A -m 'too short'
transaction abort!
rollback completed
abort: pretxncommit.msglen hook exited with status 1
$ hg commit -A -m 'long enough'

Checking for trailing whitespace

An interesting use of a commit-related hook is to help you to write cleaner code. A simple example of “cleaner code” is the dictum that a change should not add any new lines of text that contain “trailing whitespace”. Trailing whitespace is a series of space and tab characters at the end of a line of text. In most cases, trailing whitespace is unnecessary, invisible noise, but it is occasionally problematic, and people often prefer to get rid of it.

You can use either the precommit or pretxncommit hook to tell whether you have a trailing whitespace problem. If you use the precommit hook, the hook will not know which files you are committing, so it will have to check every modified file in the repository for trailing white space. If you want to commit a change to just the file foo, but the file bar contains trailing whitespace, doing a check in the precommit hook will prevent you from committing foo due to the problem with bar. This doesn't seem right.

Should you choose the pretxncommit hook, the check won't occur until just before the transaction for the commit completes. This will allow you to check for problems only the exact files that are being committed. However, if you entered the commit message interactively and the hook fails, the transaction will roll back; you'll have to re-enter the commit message after you fix the trailing whitespace and run hg commit again.

$ cat .hg/hgrc
[hooks]
pretxncommit.whitespace = hg export tip | (! egrep -q '^\+.*[ \t]$')
$ echo 'a ' > a
$ hg commit -A -m 'test with trailing whitespace'
adding a
transaction abort!
rollback completed
abort: pretxncommit.whitespace hook exited with status 1
$ echo 'a' > a
$ hg commit -A -m 'drop trailing whitespace and try again'

In this example, we introduce a simple pretxncommit hook that checks for trailing whitespace. This hook is short, but not very helpful. It exits with an error status if a change adds a line with trailing whitespace to any file, but does not print any information that might help us to identify the offending file or line. It also has the nice property of not paying attention to unmodified lines; only lines that introduce new trailing whitespace cause problems.

#!/usr/bin/env python
#
# save as .hg/check_whitespace.py and make executable

import re

def trailing_whitespace(difflines):
    # 
    linenum, header = 0, False

    for line in difflines:
        if header:
            # remember the name of the file that this diff affects
            m = re.match(r'(?:---|\+\+\+) ([^\t]+)', line)
            if m and m.group(1) != '/dev/null':
                filename = m.group(1).split('/', 1)[-1]
            if line.startswith('+++ '):
                header = False
            continue
        if line.startswith('diff '):
            header = True
            continue
        # hunk header - save the line number
        m = re.match(r'@@ -\d+,\d+ \+(\d+),', line)
        if m:
            linenum = int(m.group(1))
            continue
        # hunk body - check for an added line with trailing whitespace
        m = re.match(r'\+.*\s$', line)
        if m:
            yield filename, linenum
        if line and line[0] in ' +':
            linenum += 1

if __name__ == '__main__':
    import os, sys
    
    added = 0
    for filename, linenum in trailing_whitespace(os.popen('hg export tip')):
        print >> sys.stderr, ('%s, line %d: trailing whitespace added' %
                              (filename, linenum))
        added += 1
    if added:
        # save the commit message so we don't need to retype it
        os.system('hg tip --template "{desc}" > .hg/commit.save')
        print >> sys.stderr, 'commit message saved to .hg/commit.save'
        sys.exit(1)

The above version is much more complex, but also more useful. It parses a unified diff to see if any lines add trailing whitespace, and prints the name of the file and the line number of each such occurrence. Even better, if the change adds trailing whitespace, this hook saves the commit comment and prints the name of the save file before exiting and telling Mercurial to roll the transaction back, so you can use the -l filename option to hg commit to reuse the saved commit message once you've corrected the problem.

$ cat .hg/hgrc
[hooks]
pretxncommit.whitespace = .hg/check_whitespace.py
$ echo 'a ' >> a
$ hg commit -A -m 'add new line with trailing whitespace'
a, line 2: trailing whitespace added
commit message saved to .hg/commit.save
transaction abort!
rollback completed
abort: pretxncommit.whitespace hook exited with status 1
$ sed -i 's, *$,,' a
$ hg commit -A -m 'trimmed trailing whitespace'
a, line 2: trailing whitespace added
commit message saved to .hg/commit.save
transaction abort!
rollback completed
abort: pretxncommit.whitespace hook exited with status 1

As a final aside, note in the example above the use of sed's in-place editing feature to get rid of trailing whitespace from a file. This is concise and useful enough that I will reproduce it here (using perl for good measure).

perl -pi -e 's,\s+$,,' filename

Bundled hooks

Mercurial ships with several bundled hooks. You can find them in the hgext directory of a Mercurial source tree. If you are using a Mercurial binary package, the hooks will be located in the hgext directory of wherever your package installer put Mercurial.

`acl`—access control for parts of a repository

The acl extension lets you control which remote users are allowed to push changesets to a networked server. You can protect any portion of a repository (including the entire repo), so that a specific remote user can push changes that do not affect the protected portion.

This extension implements access control based on the identity of the user performing a push, not on who committed the changesets they're pushing. It makes sense to use this hook only if you have a locked-down server environment that authenticates remote users, and you want to be sure that only specific users are allowed to push changes to that server.

Configuring the `acl` hook

In order to manage incoming changesets, the acl hook must be used as a pretxnchangegroup hook. This lets it see which files are modified by each incoming changeset, and roll back a group of changesets if they modify “forbidden” files. Example:

[hooks]
pretxnchangegroup.acl = python:hgext.acl.hook

The acl extension is configured using three sections.

The acl section has only one entry, sources, which lists the sources of incoming changesets that the hook should pay attention to. You don't normally need to configure this section.

serve: Control incoming changesets that are arriving from a remote repository over http or ssh. This is the default value of sources, and usually the only setting you'll need for this configuration item.
pull: Control incoming changesets that are arriving via a pull from a local repository.
push: Control incoming changesets that are arriving via a push from a local repository.
bundle: Control incoming changesets that are arriving from another repository via a bundle.

The acl.allow section controls the users that are allowed to add changesets to the repository. If this section is not present, all users that are not explicitly denied are allowed. If this section is present, all users that are not explicitly allowed are denied (so an empty section means that all users are denied).

The acl.deny section determines which users are denied from adding changesets to the repository. If this section is not present or is empty, no users are denied.

The syntaxes for the acl.allow and acl.deny sections are identical. On the left of each entry is a glob pattern that matches files or directories, relative to the root of the repository; on the right, a user name.

In the following example, the user docwriter can only push changes to the docs subtree of the repository, while intern can push changes to any file or directory except source/sensitive.

[acl.allow]
docs/** = docwriter
[acl.deny]
source/sensitive/** = intern

Testing and troubleshooting

If you want to test the acl hook, run it with Mercurial's debugging output enabled. Since you'll probably be running it on a server where it's not convenient (or sometimes possible) to pass in the --debug option, don't forget that you can enable debugging output in your ~/.hgrc:

[ui]
debug = true

With this enabled, the acl hook will print enough information to let you figure out why it is allowing or forbidding pushes from specific users.

`bugzilla`—integration with Bugzilla

The bugzilla extension adds a comment to a Bugzilla bug whenever it finds a reference to that bug ID in a commit comment. You can install this hook on a shared server, so that any time a remote user pushes changes to this server, the hook gets run.

It adds a comment to the bug that looks like this (you can configure the contents of the comment—see below):

Changeset aad8b264143a, made by Joe User
	<joe.user@domain.com> in the frobnitz repository, refers
	to this bug. For complete details, see
	hg.domain.com/frobnitz?cmd=changeset;node=aad8b264143a
	Changeset description: Fix bug 10483 by guarding against some
	NULL pointers

The value of this hook is that it automates the process of updating a bug any time a changeset refers to it. If you configure the hook properly, it makes it easy for people to browse straight from a Bugzilla bug to a changeset that refers to that bug.

You can use the code in this hook as a starting point for some more

Mercurial: The Definitive Guideby Bryan O'Sullivan