Opened 11 years ago

Closed 9 years ago

#142 closed enhancement (fixed)

Add monitoring for failure of the backend network

Reported by: mitchb Owned by:
Priority: minor Milestone:
Component: internals Keywords: sipb-noc
Cc:

Description

We don't presently have a Nagios test that will alert us if there's a failure of the backend network switch, or the backend interface on an individual server. All the probes for sql.mit.edu will still pass because they run over the public network.

We should use some plugin to run a 'select 1;' or something similarly trivial on each scripts server.

Change History (4)

comment:1 Changed 10 years ago by adehnert

  • Keywords sipb-noc added

comment:2 Changed 9 years ago by adehnert

  • Resolution set to fixed
  • Status changed from new to closed

Fixed (see sipb-nagios commit 7d9206eae4e48824e0203d1ce19c4563f9bb664b and scripts r2190).

comment:3 Changed 9 years ago by quentin

  • Resolution fixed deleted
  • Status changed from closed to reopened

This isn't good enough; if the routes over the backend interface disappear, we will happily talk to sql over the frontend network and not notice the outage.

Unfortunately, it doesn't look like check_ping supports specifying an interface to check from. I guess we could pretend and ping the backend IP of sql instead.

comment:4 Changed 9 years ago by adehnert

  • Resolution set to fixed
  • Status changed from reopened to closed

Fixed in r2192.

Note: See TracTickets for help on using tickets.