Opened 12 years ago

Last modified 9 years ago

#84 reopened enhancement

Automatically notice when installed software is out of sync on the servers

Reported by: mitchb Owned by:
Priority: minor Milestone:
Component: internals Keywords:
Cc:

Description

As reported by xavid in RT#805191

We might want to come up with a better system for noticing when machines get out-of-sync in terms of what's installed, though.

Change History (5)

comment:1 Changed 10 years ago by ezyang

  • Resolution set to fixed
  • sensitive set to 0
  • Status changed from new to closed

Fixed by achernya; we now get emailed when we're out of sync. The script lives in locker/sbin/rpm-*.sh

comment:2 Changed 10 years ago by ezyang

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:3 Changed 9 years ago by ezyang

Apparently reopened because it wasn't really what Mitch wanted.

comment:4 Changed 9 years ago by mitchb

From zephyr today, here's a description of what I originally envisioned:

-> scripts / consistency-checker / mitchb

The overall idea I had for how this would work was that we'd have a global (as in on all servers, not through hacron or cron_scripts) cron job that once a day collects a pile of information about the current setup of a server. We'll cover what it includes later. This job would run at, say, X:YYam. The results of this information gathering would be deposited in some appropriate area of the scripts locker, much like machines go poke at their cron_status flags there.

We'd then have an hacron or cron_scripts job in the scripts account that runs at, say, X+1:YYam. Its function would be to process all the individual per-machine reports, determining what they have in common, and mailing us a report of what's different, if everything. It would perhaps zephyr a highly-redacted count of the number of things out of alignment that it found. This avoids the hokey hostbased ssh kludge, and allows machines to work in parallel, and potentially do privileged things (we might need the info gatherer to have some sudo permissions, for example, to look at a couple files in /etc).

I hadn't really decided whether using a database to process this info was worthwhile over just parsing flat files into data structures.

scripts / consistency-checker / mitchb

As for the stuff I wanted it to encompass, I wanted it certainly to deal with rpms, but also keep track of the EVR.arch of them, not just their presence, because that difference isn't caught by our old install procedure or by the current checker, and can lead to bugs that we thought were fixed by an update still being present on a subset of the servers - looks like an intermittent bug and is hard to hunt down. I also wanted this to check on pear, pecl, gem, easy_install, CPAN stuff, etc. via those package systems and do the same thing for them.

And I wanted it to do essentially 'svn up -nq' or whatever the command is in our checkouts, which is a major thing we don't check for at all now.

That's the general gist. "hard" problems include figuring out what to do about a server that hasn't "reported in" or how to determine which machine is "right" if a package is present on a couple machines (but at different versions) and missing on others. Really, the tool doesn't so much need to figure out what "right" is, but how to present this information in an easy-for-us-to-digest layout is something I had trouble thinking about scalably for an arbitrary number of machines.

comment:5 Changed 9 years ago by ezyang

  • Type changed from defect to enhancement
Note: See TracTickets for help on using tickets.