The Cluster Guy spacer

ǝɹǝɥ ʇxǝʇ lnɟʇɥƃısuı

Archive

Feb
18th
Mon
spacer

TheClusterGuy blog is moving

Nothing against Tumblr, but with Posterous being shut down (even though I’m not using it here), I’ve decided to take back control of my content.

As a result I’ve started using Octopress to publish to blog.clusterlabs.org

Octopress is pretty nifty. It generates a static site (good for performance) that can either be hosted at GitHub or (if GitHub ever goes dark) anywhere Apache can run. I was even able to easily import my old posts!

For now I’m taking the GitHub path with a custom domain name (not the same as this one so that the old links still work).

See you on the other side

Feb
13th
Wed
spacer

Pacemaker 1.0.13 now available

Thanks once again to the efforts of the fine folks from NTT, the latest bug fixes have been back-ported from 1.1 and another instalment of the Pacemaker 1.0 release series is now ready for general consumption.

Changesets 129 
Diff 173 files changed, 12206 insertions(+), 767 deletions(-)

Important changes since Pacemaker-1.0.12 include:

  • cib: Don’t halt disk writes if the previous digest is missing
  • cib: Fix coverity RESOURCE_LEAK defect
  • Core: Avoid assertion error when underflowing days of the month in iso8601 date code
  • Core: Correctly determine when an XML file should be decompressed
  • Core: Ensure signals are handled eventually in the absense of timer sources or IPC messages
  • Core: Strip text nodes from on disk xml files
  • crmd: cl#5051 - Fixes file leak in pe ipc connection initialization.
  • crmd: cl#5057 - Restart sub-systems correctly (bnc#755671)
  • crmd: Fast-track shutdown if we couldn’t request it via attrd
  • crmd: Leave it up to the PE to decide which ops can/cannot be reload
  • crmd: Prevent use-of-NULL when free’ing empty hashtables
  • crmd: Supply format arguments in the correct order
  • Fix memory leak in cib when writing the cib contents.
  • legacy: Set to the minimum scheduling priority when using SCHED_RR policy (bnc#779259)
  • pengine: Bug #5007, Fixes use of colocation constraints with multi-state resources
  • pengine: Bug cl#5038 - Prevent restart of anonymous clones when clone-max decreases
  • pengine: Bug cl#5101 - Ensure stop order is preserved for partially active groups
  • pengine: cl#5069 - Honor ‘on-fail=ignore’ even when operation is disabled.
  • pengine: cl#5072 - Fixes monitor op stopping after rsc promotion.
  • pengine: Ensure post-migration stop actions occur before node shutdown
  • pengine: Fix coverity REVERSE_INULL defects
  • pengine: Fix use-after-free errors detected by coverity
  • pengine: Prevent segfault when ensuring unmanaged resources don’t prevent shutdown
  • pengine: Reload of a resource no longer causes a restart of dependant resources
  • RA: controld - use the correct dlm_controld when membership comes from corosync directly
  • tools: crm_resource - Fix coverity FORWARD_NULL defect
  • Tools: crm_shadow - Bug cl#5062 - Correctly set argv[0] when forking a shell process

You also can see the full changelog,

The next 1.0.x release will occur if and when needed (but probably not before mid-2013).

The source tarball is also available directly from GitHub.

Users of more most distributions are encouraged to use the latest 1.1.x release - either from the 1.1 Build Area or from the distribution directly.

General installation instructions are available at from the ClusterLabs wiki.

Oct
30th
Tue
spacer

Can Pacemaker 1.1.8 be used with…

Short answer: yes Longer answer: seriously, yes :-)

Pacemaker 1.1.8 should be fully functional with all three current corosync release series (1.2.x, 1.4.x and 2.0.x) as well as Heartbeat.

We have not removed support for anything, so if something is not working for you, please let us know on the mailing list.

spacer

Pacemaker and Cluster Filesystems

There is some confusion out there on how to use Pacemaker with the OCFS2 and GFS2 cluster filesystems.

Section 8.1 and 8.2 of Clusters from Scratch mentions some of the issues involved in the context of CMAN, but the principles are generally applicable.

The most important take-away, is that it is very important that all parts of the stack are making decisions based on the same membership and quorum data.

There were/are three options to achieve this:

  1. have everyone talk to pacemaker
  2. have everyone talk to cman
  3. have everyone talk to corosync

Option 1 - Everyone Talks to Pacemaker

This option was written for and is maintained/supported by SUSE but didn’t really gain much traction outside of SLES. It also relies on a pacemaker plugin that gets loaded into corosync/openais, something that is no longer possible with corosync 2.x It briefly appeared upstream but once option 2 became possible, option 1 was removed (not by me).

Anyone not paying for OCS2 on SLES is probably best advised to move to option 2 or 3.

Requirements:

  • Filesystems supported: OCFS2
  • Corosync: 1.x
  • Pacemaker: any
  • Other: openais

Option 2 - Everyone Talks to CMAN

This is what works on most distros (except openSUSE/SLES) today. By virtue of being part of RHCS and its age, cman is available on most of today’s enterprise distros and is supported by OCFS2 and GFS2.

By modifying Pacemaker to support it, we gained the ability to use GFS2 and OCFS2 “for free” - without the need for custom dlm, gfs and ocfs controld’s.

Requirements:

  • Filesystems supported: GFS2, OCFS2
  • Corosync: 1.x
  • Pacemaker: 1.1.6 or later
  • Other: cman, openais

Option 3 - Everyone Talks to Corosync 2.0

With RHEL6 to be the last hoorah for CMAN, this is where things are headed upstream, however the only distro that ships this solution today is Fedora-17 (and shortly 18).

In this scenario, all components obtain membership and quorum directly from corosync. So far OCFS2 is the only component that hasn’t been updated to support this - they’re appear content to continue using their own messaging and membership layer.

Requirements:

  • Filesystems supported: GFS2
  • Corosync: 2.x
  • Pacemaker: 1.1.7 or later
  • Other: none

Which Is The Best Option For Me

If you’re a SLES customer looking to use OCFS2, absolutely take the Option 1 route. For everyone else, although Option 3 is architecturally superior, Option 2 is likely to be the safest approach for the next couple of years.

Mar
29th
Thu
spacer

Pacemaker 1.1.7 Now Available

After much hard work, the latest installment of the Pacemaker 1.1 release series is now ready for general consumption.

Changesets 513 
Diff 1171 files changed, 90472 insertions, 19368 deletions

As well as the usual round of bug fixes, see the full changelog, this new release brings:

  • Support for Corosync 2.0
  • Logging optimisations (less of it and less work performed for logs that wont be printed)
  • The ability to specify that A starts after ( B or C or D )
  • Support for advanced fencing topologies: eg. kdump || (network && disk) || power
  • Resource templates and tickets have been promoted to the stable schema
  • Support for gracefully giving up resources depending on a ticket

As per our release calendar, the next 1.1 release is planned for mid-July.

Packages for all current editions of Fedora have been built and will be appearing shortly in the update channels. Other distributions will follow when their schedules allow it.

The source tarball (tar.gz) is also available directly from GitHub.

General installation instructions are available at from the ClusterLabs wiki.

Nov
24th
Thu
spacer

Pacemaker 1.0.12 Released

Thanks once again to the efforts of Keisuke MORI from NTT, the latest bug fixes have been back-ported from 1.1 and another instalment of the Pacemaker 1.0 release series is now ready for general consumption.

Changesets 96 
Diff 121 files changed, 8617 insertions(+), 988 deletions(-)

Important changes since Pacemaker-1.0.11 include:

  • cib: Call gnutls_bye() and shutdown() when disconnecting from remote TLS connections
  • cib: Remove disconnected remote connections from mainloop
  • crmd: Cancel timers for actions that were pending on dead nodes
  • crmd: Do not wait for actions that were pending on dead nodes
  • crmd: Ensure we do not attempt to perform action on failed nodes
  • PE: Correctly recognise which recurring operations are currently active
  • PE: Demote from Master does not clear previous errors
  • PE: Ensure restarts due to definition changes cause the start action to be re-issued not probes
  • PE: Ensure role is preserved for unmanaged resources
  • PE: Ensure unmanaged resources have the correct role set so the correct monitor operation is chosen
  • PE: Move master based on failure of colocated group
  • pengine: Correctly determine the state of multi-state resources with a partial operation history
  • PE: Only allocate master/slave resources once
  • Shell: implement -w,—wait option to wait for the transition to finish
  • Shell: repair template list command

You also can see the full changelog,

I have updated the release calendar and the next 1.0.x release is planned for mid-May 2012.

The source tarball is also available directly from GitHub.

Pre-built packages for Pacemaker are available immediately for current openSUSE (12.1, 11.4, 11.3) and Fedora (16, 15, 14) releases as well as EPEL-5 from the ClusterLabs Build Area.

Users of more most distributions are encouraged to use the latest 1.1.x release - either from the 1.1 Build Area or from the distribution directly.

General installation instructions are available at from the ClusterLabs wiki.

Oct
13th
Thu
spacer

New Version Control System

Since September, Pacemaker has started using Git for the 1.1 and devel trees.

There were some minor technical advantages over Mercurial (which I still personally prefer), but mostly the decision was driven by the pain associated with switching between SCMs multiple times a day.

The majority of development now happens on GitHub, which has some great features for reviewing patches and general collaboration.

The Pacemaker tree is also periodically sync’d to the Cluster Labs server in case GitHub is unavailable for any reason.

For those new to Git, GitHub has many tips for setting up Git, creating a local copy of the Pacemaker repo to work in, submitting your changes upstream (we use the Fork + Pull Model), and other assorted resources.

Be sure to configure email and user information so you get credit for your hard work too!

spacer

New Issue Tracker

Since it’s clearly not acceptable for our issue tracker to be offline for months at a time, it is time to replace the Bugzilla instance hosted by the Linux Foundation with something else.

One candidate that came close was the github issue tracker, but alas it doesn’t support attachments. The end result is that we now have an instance of Bugzilla v4 at:

bugs.clusterlabs.org

Bug numbers start at 5000.
This avoids clashing with older ones and may enable us to import the old ones if it ever comes back up again. I would advise people to assume this wont happen and to re-create any unresolved issues.

May
2nd
Mon
spacer

Pacemaker 1.0.11 Released

The latest installment of the Pacemaker 1.0 release series is now ready for general consumption.

Changesets 85 
Diff 500 files changed, 69642 insertions(+), 58270 deletions(-)

Thanks once again to the efforts of Keisuke MORI and NTT, the latest bug fixes have been back-ported from 1.1

Important changes since Pacemaker-1.0.10 include:

  • cib: Repair the processing of updates sent from peer nodes
  • crmd: All pending operations should be recorded, even recurring ones with high start delays
  • crmd: Bug lf#2509 - Watch for config option changes from the CIB even if we’re not the DC
  • crmd: Bug lf#2528 - Introduce a slight delay when creating a transition to allow attrd time to perform its updates
  • crmd: Bug lf#2545 - Ensure notify variables are accurate for stop operations
  • crmd: Bug lf#2559 - Fail actions that were scheduled for a failed/fenced node
  • crmd: Cancel recurring operations while we’re still connected to the lrmd
  • crmd: Don’t abort transitions when probes are completed on a node
  • crmd: Ensure the CIB is always writable on the DC by removing a timing hole
  • crmd: Update failcount for failed promote and demote operations
  • PE: Bug lf#2495 - Prevent segfault by validating the contents of ordering sets
  • PE: Bug lf#2508 - Correctly reconstruct the status of anonymous cloned groups
  • PE: Bug lf#2544 - Prevent unstable clone placement by factoring in the current node’s score before all others
  • PE: Bug lf#2554 - target-role alone is not sufficient to promote resources
  • PE: Ensure fencing of the DC preceeds the STONITH_DONE operation
  • PE: Ensure that fencing has completed for stop actions on stonith-dependent resources (lf#2551)
  • PE: Prevent clones from being stopped because resources colocated with them cannot be active
  • PE: Prevet use-after-free resulting from unintended recursion when chosing a node to promote master/slave resources
  • Shell: don’t create empty optional sections (bnc#665131)
  • Tools: Bug lf#2528 - Make progress when attrd_updater is called repeatedly within the dampen interval but with the same value
  • Tools: Prevent crm_resource commands from being lost due to the use of cib_scope_local

You also can see the full changelog,

As per our release calendar, the next 1.0.x release is planned for mid-September.

The source tarball is also available directly from Mercurial.

Pre-built packages for Pacemaker and it’s immediate dependancies are available immediately for openSUSE 11.2, 11.3, Fedora-13 and EPEL-5 from the ClusterLabs Build Area.

Users of more recent distributions are encouraged to use the latest 1.1.x - either from the 1.1 Build Area or the distribution directly.

General installation instructions are available at from the ClusterLabs wiki.

Feb
23rd
Wed
spacer

Pacemaker 1.1.5 Released

The latest installment of the Pacemaker 1.1 release series is now ready for general consumption.

Changesets 184 
Diff 605 files changed, 46103 insertions(+), 26417 deletions(-)

As well as the usual round of bug fixes, see the full changelog, S.U.S.E. has implemented support for ACLs. This means that you can now delegate permission to control parts of the cluster (as defined by you) to non-root users.

ACLs are still disabled by default, but you can read their documentation, provide feedback and decide if its something you want to use.

As per our release calendar, the next 1.1 release is planned for mid-April and 1.0.11 should be available in March depending on how quickly we can get the bugfixes from 1.1 backported.

Pre-built packages for Pacemaker and it’s immediate dependancies are available immediately for openSUSE 11.3, Fedora-14 and EPEL-5 from the ClusterLabs Build Area.

The source tarball is also available directly from Mercurial.

General installation instructions are available at from the ClusterLabs wiki.

spacer
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.