Opened 14 years ago

Closed 12 years ago

#20 closed defect (fixed)

scripts LVS design issues

Reported by: andersk Owned by:
Priority: minor Milestone:
Component: web Keywords:
Cc:

Description (last modified by andersk)

(Imported from help.mit.edu #431727.)

Now that Nagios doesn't suck, we can actually see the scripts outage caused by the AFS server restart every Sunday morning. This made me realize a few things:

  • Our fallback to hodge-podge isn't just an exceptional condition; it happens every week. Thus it's an even worse idea than I thought it was. Viewers will get confused, and search engines may remove pages from their indexes, if they happen to get a 404 error from hodge-podge at the wrong moment.

  • Since the heartbeat script is in the scripts locker, the AFS server that serves it (aegisthus) is a single point of failure. Ideally LVS would check multiple heartbeat scripts in lockers on several different AFS servers, and continue routing connections if any of them respond.

Change History (3)

comment:1 Changed 14 years ago by andersk

  • Description modified (diff)

comment:2 Changed 14 years ago by andersk

  • Description modified (diff)

comment:3 Changed 12 years ago by quentin

  • Resolution set to fixed
  • Status changed from new to closed

The LVS directors now run a local sorry-server that responds to all requests with a 500 error.

Note: See TracTickets for help on using tickets.