The Git Autoinstaller

TODO NOW:

- Symlinked rerere to get awesomeness. Problems:
    - Permissions
    - Might not make a huge difference; how does it handle empty file
      and removed file cases?
    - Need to manually run git rerere subsequently to reap benefits
    - Majority of resolutions have to happen pre-merge (see below)
    - Consider workflow: run wizard mass-upgrade, and then begin
      resolving working copies one by one.  Each time we resolve
      a copy, it should cause other copies to start magically resolving.
      So, ordering should be:
        1. Perform merge
        2. If it fails, merge the rr-cache with central rr-cache
           (this operation needs to be atomic) and replace it
           with a symlink.  File permissions preferably should
           be made correct, but don't have to be since only root
           will be touching subsequently.  If the hash already exists,
           don't do anything (maybe record this for the benefit
           of Mister Kite aka so we don't have to do a full traversal,
           this optimization might be essential)
        3. When a human is resolving the merges, they are "low
           concurrency", i.e. only one commit recording rerere will
           happen at a time.  This means that rr-cache does not
           need to be concurrent safe.  Some number of hashes in
           the rr-cache will start having postimages; we'll use
           a full-scan to figure that out.  Then cross-reference those
           with the recorded pending resolutions, and figure out which
           checkouts we can run rerere on (this gets permissions kind
           of tricky).  We'll try an alternative plan: manually require
           the user run some sort of retry command that does this as
           root; presumably they'd run this every ten installs or
           something.  A user can run git rerere to get a resolution
           early.
      This requires some new data-structures:
        - Besides the merge.txt file (which should never ever change),
          we should have an outstanding.txt file which gets modified
          as our scripts do resolutions behind our back.  Those modifications
          might a little annoying for a human to keep up with, so we should
          recommend something like watch -n2 "head file" or something
        - We need to keep track of the hashes and the cross-referencing.
          A very small sqlite database might be a good idea here, although
          the type of information we're interested in a somewhat unnatural
          query.  Alternatively, we just have a very simple text file.
- Make it possible to say certain classes of missing files are ok

- Wizard needs a correct arch/ setup
- The wizard command, when not on scripts, should automatically SSH to
  scripts and start executing there?
- Write the code to make Wordpress figure out its URL from the database

- Remerges aren't reflected in the parent files, so `git diff` output is
  spurious.  Not sure how to fix this w/o tree hackery.
- Sometimes users remove files. Well, if those files change, they automatically
  get marked as conflicted.  Maybe we should say for certain files "if they're
  gone, they're gone forever"?  What is the proper resolution?

- Parse output HTML for class="error" and give those errors back to the user (done),
  then boot them back into configure so they can enter in something different

- Replace gaierror with a more descriptive name (this is a DNS error)

- Pre-emptively check if daemon/scripts-security-upd
  is not on scripts-security-upd list (/mit/moira/bin/blanche)
- If you try to do an install on scripts w/o sql, it will sign you up but fail to write
  the sql.cnf file. This sucks.

- Web application for installing autoinstalls has a hard problem
  with credentials (as well as installations that are not conducted
  on an Athena machine.)  We have some crazy ideas involving a signed
  Java applet that uses jsch to SSH into athena.dialup and perform
  operations.

- Pay back code debt
    - Tidy up common code in callAsUser and drop_priviledges in shell;
      namely cooking up the sudo and environment variable lines
    - Summary script should be more machine friendly, and should not
      output summary charts when I increase specificity
    - Report code in wizard/command/__init__.py is ugly as sin.  Also,
      the Report object should operate at a higher level of abstraction
      so we don't have to manually increment fails. (in fact, that should
      probably be called something different).  The by-percent errors should
      also be automated.
    - Move resolutions in mediawiki.py to a text file? (the parsing overhead
      may not be worth it)
    - PHP end of file allows omitted semicolon, can result in parse error
      if merge resolutions aren't careful.  `php -l` can be a quick stopgap

- Other stuff
    - Figure out why Sphinx sometimes fails to crossref :func: but wil
      crossref :meth:, even though the dest is very clearly a function.
      Example: :func:`wizard.app.php.re_var`
    - The TODO extension for Sphinx doesn't properly force a full-rebuild
    - Code annotation!
    - Make single user mass-migrate work when not logged in as root.  The
      primary difficulty is making the parallel-find information easily
      accessible to individual users: perhaps we can do a single-user
      parallel-find on the fly.
    - Don't use the scripts heuristics unless we're on scripts with the
      AFS patch.  Check with `fs sysname`
    - Make 'wizard summary' generate nice pretty graphs of installs by date
      (more histograms, will need to check actual .scripts-version files.)
    - It should be able to handle installs like Django where there's a component
      that gets installed in web_scripts and another directory that gets installed
      in Scripts.

- ACLs is a starting point for sending mail to users, but it has
  several failure modes:
    - Old maintainers who don't care who are still on the ACL
    - Private AFS groups that aren't mailing lists and that we
      can't get to
  A question is whether or not sending mail actually helps us:
  many users will probably have to come back to us for help; many
  other users won't care.

PULLING OUT CONFIGURATION FILES IN AN AUTOMATED MANNER

advancedpoll: Template file to fill out
django: Noodles of template files
gallery2: Multistage install process
joomla: Template file
mediawiki: One-step install process
phpbb: Multistage install process
phpical: Template file
trac: NFC
turbogears: NFC
wordpress: Multistage install process

COMMIT MESSAGE FIELDS:

Installed-by: username@hostname
Pre-commit-by: Real Name <username@mit.edu>
Upgraded-by: Real Name <username@mit.edu>
Migrated-by: Real Name <username@mit.edu>
Wizard-revision: abcdef1234567890
Wizard-args: /wizard/bin/wizard foo bar baz

GIT COMMIT FIELDS:

Committer: Real Name <username@mit.edu>
Author: lockername locker <lockername@scripts.mit.edu>

NOTES:

- It is not required nor expected for update scripts to exist for all
  intervening versions that were present pre-migration; only for it
  to work on the most recent migration.

- Currently all repositories are initialized with --shared, which
  means they have basically ~no space footprint.  However, it
  also means that /mit/scripts/wizard/srv MUST NOT lose revs after
  deployment.

OVERALL PLAN:

* Some parts of the infrastructure will not be touched, although I plan
  on documenting them.  Specifically, we will be keeping:

    - parallel-find.pl, and the resulting
      /mit/scripts/.htaccess/scripts/sec-tools/store/scriptslist

* The new procedure for generating an update is as follows:
  (check out the mass-migration instructions for something in this spirit,
  although uglier in some ways; A indicates the step /should/ be automated)

    0. ssh into not-backward, temporarily give the daemon.scripts-security-upd
       bits by blanching it on system:scripts-security-upd, and run parallel-find.pl

    1. [ see doc/upgrade.rst ]

    [ENTER HERE FROM CREATING A NEW REPO]

    9. Push all of your changes in a public place, and encourage others
       to test, using --srv-path and a full path.

[ XXX: doc/deploy.rst ]
      GET APPROVAL BEFORE PROCEEDING ANY FURTHER;
      THIS IS PUSHING THE CHANGES TO THE PUBLIC

      NOTE: The following commands are to be run on not-backward.mit.edu.
      You'll need to add daemon.scripts-security-upd to
      scripts-security-upd to get bits to do this.  Make sure you remove
      these bits when you're done.

   10. Run `wizard research appname`
       which uses Git commands to check how many
       working copies apply the change cleanly, and writes out a logfile
       with the working copies that don't apply cleanly.  It also tells
       us about "corrupt" working copies, i.e. working copies that
       have over a certain threshold of changes.

   11. Run `wizard mass-upgrade appname`, which applies the update to all working
       copies possible.

   12. Run parallel-find.pl to update our inventory

[ XXX: doc/upgrade.rst ]
* For mass importing into the repository, there are a few extra things:

    * When mass producing updates, if the patch has changed you will have to
      do a special procedure for your merge:

        git checkout pristine
        # NOTE: Now, the tricky part (this is different from a real update)
        git symbolic-ref HEAD refs/heads/master
        # NOTE: Now, we think we're on the master branch, but we have
        # pristine copy checked out
        # NOTE: -p0 might need to be twiddled
        patch -p0 < ../app-1.2.3/app-1.2.3.patch
        git add .
        # reconstitute .scripts directory
        git checkout v1.2.2-scripts -- .scripts
        git add .scripts
        # NOTE: Fake the merge
        git rev-parse pristine > .git/MERGE_HEAD

      You could also just try your luck with a manual merge using the patch
      as your guide.

[ XXX: doc/layout.rst ]
* The repository for a given application will contain the following files:

    - The actual application's files, as from the official tarball

    - A .scripts directory, with the intent of holding Scripts specific files
      if they become necessary.

        - .scripts/dsn, overriding database source name