Opened 14 years ago

Closed 13 years ago

#174 closed task (invalid)

Nagios plugin to check for different generation ID error message

Reported by: ezyang Owned by:
Priority: normal Milestone:
Component: internals Keywords:
Cc:

Description (last modified by ezyang)

Check for the error message:

[23/Sep/2010:15:18:56 -0400] NSMMReplicationPlugin - agmt="cn="GSSAPI
Replication to whole-enchilada.mit.edu"" (whole-enchilada:389): Replica has a
different generation ID than the local data.

which indicates something has gone wrong with replication. This message does not appear to be picked up by our existing MMR checks.

Old text: In the nsds50ruv attribute, replication agreements record a "generation ID" which prevents two masters initialized from different sources from attempting to overwrite each other and cause a mess. Unfortunately, this safety mechanism also means that replication doesn't work, and furthermore, while this will result in a lot of spew to the LDAP error log, this won't show up in the usual status field. We should check that all of the replication generations on our servers are consistent.

Change History (2)

comment:1 Changed 14 years ago by ezyang

  • Description modified (diff)
  • Summary changed from Nagios plugin to ensure replication generations are correct to Nagios plugin to check for different generation ID error message

Rewrote proposal based on conversation with Mitch.

comment:2 Changed 13 years ago by mitchb

  • Resolution set to invalid
  • Status changed from new to closed

The generation id is actually based on the original replica that the replicated suffix traces its heritage back to, and is not related to where this server was initialized from. You can naturally see a different generation id error created when a server is first installed before it's initialized from an existing master. The message described above is not a problem and is the server reporting that it's going to punt the incorrect changelog. It will proceed to use the correct one or create it if it doesn't exist. There was a bug that prevented the bogus changelog from actually being deleted, which I've corrected in r1751 and will send upstream. This caused the message to show up on each restart, but it was always actually harmless.

Note: See TracTickets for help on using tickets.