Suppress ^M characters from Git progress bars.

[wizard.git] / TODO
diff --git a/TODO b/TODO

index d448def81da98b1a641932183fba922050cac09a..93444a86edfd18d084b1dc650f6a971db74700bf 100644 (file)
--- a/TODO
+++ b/TODO
@@ -2,40 +2,228 @@ The Git Autoinstaller
  
  TODO NOW:
  
-- Something needs to be done if disk quota is exceeded:
-    - Catch the OSError and throw a domain-specific error
-      so massmigrate can deal gracefully
-    - Perform an added memory calculation, check this against
-      remaining quotai, and bail out if it's within some
-      percentage of their remaining quota
-    - Checks should also be performed against the partition
-- Check how many autoinstalls are missing w bits for
-  daemon.scripts
-- Whiteboard the flow for performing an upgrade on a single
-  install. How assisted does it need to be?
-- Conduct migration tool testing (check andersk, geofft for
-  sample MediaWikis)
-- Set up migration server
-- Run parallel-find.pl
-- Migrate all mediawikis
-- Wordpress needs to have a .scripts/update script written for
-  its latest version
+- Make it faster
+    - Wipe temp directories if the upgrade succeeds
+    - Put temp directories on tmpfs before merging, then move to disk
+      if it fails and needs resolution.  /var/run is a pretty good
+      choice if your running as root, not so good if you're not.  Find
+      other common tmpfs locations (mount | grep tmpfs) and perhaps
+      check some common ones.
+    - Certain classes of error will continually fail, so they should
+      put in a different "seen" file which also skips them, unless
+      we have some sort of gentle force
+
+- Keep my sanity when upgrading 1000 installs
+    - Distinguish between errors(?)
+    - Custom merge algo: absolute php.ini symlinks to relative symlinks (this
+      does not seem to have been a problem in practice)
+    - Custom merge algo: check if it's got extra \r's in the file,
+      and dos2unix it if it does, before performing the merge
+    - `vos exa` in order to check what a person's quota is.  We can
+      figure out roughly how big the upgrade is going to be by
+      doing a size comparison of the tars: `git pull` MUST NOT
+      fail, otherwise things are left conflicted, and not easy to fix.
+    - Prune -7 call errors and automatically reprocess them (with a
+      strike out counter of 3)
+    - Snap-in conflict resolution teaching:
+        1. View the merge conflicts after doing a short run
+        2. Identify common merge conflicts
+        3. Copypaste the conflict markers to the application.  Scrub
+           user-specific data; this may mean removing the entire
+           upper bit which is the user-version.
+        4. Specify which section to keep.  /Usually/ this means
+           punting the new change, but if the top was specified
+           it means we get a little more flexibility.  Try to
+           minimize wildcarding: those things need to be put into
+           subpatterns and then reconstituted into the output.
+      Example:
+            Input:
+                <<<<<<<
+                ***1***
+                =======
+                upstream
+                >>>>>>>
+            Output:
+                [1] # discard system string
+            Input:
+                <<<<<<<
+                old upstream
+                =======
+                new upstream
+                >>>>>>>
+            Output:
+                ['R'] # keep the new upstream string
+                # This would be useful if a particular upstream change
+                # is really close to where user changes are, so that
+                # the conflict pops up a lot and it's actually spurious
+            Input:
+                <<<<<<<
+                ***1***
+                old upstream
+                ***2***
+                old upstream
+                ***3***
+                =======
+                new upstream
+                >>>>>>>
+            Output:
+                ['R', 1, 2, 3] # should be evident
+                # it's not actually clear to me if this is useful
+        To resolve: do we need the power of regexes?  This might suck
+        because it means we need to implement escaping.  We might want
+        simple globbing to the end of line since that's common in
+        configuration files.
+
+- Distinguish from logging and reporting (so we can easily send mail
+  to users)
+    - Remove "already migrated" cruft that will accumulate if we do small
+      --limit and then increase.
+    - Logs aren't actually useful, /because/ most operations are idempotent.
+      Thus, scratch logfile and make our report files more useful: error.log
+      needs error information; we don't care too much about machinability.
+      All report files should be overwritten on the next run, since we like
+      using --limit to incrementally increase the number of things we run. Note
+      that if we add soft ignores, you /do/ lose information, so there needs
+      to be some way to also have the soft ignore report a "cached error"
+    - Report the identifier number at the beginning of all of the stdout logs
+    - Log files that already exist should be initialized with some sort
+      of separator THAT CONTAINS THE LOCATION OF THE INSTALL
+    - Don't really care about having the name in the logfile name, but
+      have a lookup txt file
+    - Figure out a way of collecting blacklist data from .scripts/blacklisted
+      and aggregate it together
+    - Failed migrations should be wired to have wizard commands in them
+      automatically log to the relevant file.  In addition, the seen file
+      should get updated when one of them gets fixed.
+    - Log files need to have dates, since it looks like upgrades will be
+      multi-day affairs
+    - Failed migration should report how many unmerged files there are
+      (so we can auto-punt if it's over a threshold)
+    - Verification failures should be written to a report file, possibly
+      with short HTML fingerprints so we can inspect them easily and
+      numbers to look at the log files
+
+- Let users use Wizard when ssh'ed into Scripts
+    - Make single user mass-migrate work when not logged in as root
+
+- Make the rest of the world use Wizard
+    - Make parallel-find.pl use `sudo -u username git describe --tags`
+      to determine version.  Make parallel-find.pl have this have greater
+      precedence.  This also means, however, that we get
+      full mediawiki-1.2.3-2-abcdef names (Have patch, pending testing and commit)
+    - Make deployed installer use 'wizard install' /or/ do a migration
+      after doing a normal install (the latter makes it easier
+      for mass-rollbacks).
+
+- Pre-emptively check if daemon/scripts-security-upd
+  is not on scripts-security-upd list (/mit/moira/bin/blanche)
+
+- Redo Wordpress conversion, with an eye for automating everything
+  possible (such as downloading the tarball and unpacking)
+
+- Pay back code debt
+    - Genericize callAsUser and drop_priviledges in shell
+    - Summary script should be more machine friendly, and should not
+      output summary charts when I increase specificity
+    - Summary script should do something intelligent when distinguishing
+      between old-style and new-style installs
+
+- Other stuff
+    - Don't use the scripts heuristics unless we're on scripts with the
+      AFS patch.  Check with `fs sysname`
+    - Make 'wizard summary' generate nice pretty graphs of installs by date
+      (more histograms, will need to check actual .scripts-version files.)
+    - It should be able to handle installs like Django where there's a component
+      that gets installed in web_scripts and another directory that gets installed
+      in Scripts.
+    - ACLs is a starting point for sending mail to users, but it has
+      several failure modes:
+        - Old maintainers who don't care who are still on the ACL
+        - Private AFS groups that aren't mailing lists and that we
+          can't get to
+      A question is whether or not sending mail actually helps us:
+      many users will probably have to come back to us for help; many
+      other users won't care.
+
+PULLING OUT CONFIGURATION FILES IN AN AUTOMATED MANNER
+
+advancedpoll: Template file to fill out
+django: Noodles of template files
+gallery2: Multistage install process
+joomla: Template file
+mediawiki: One-step install process
+phpbb: Multistage install process
+phpical: Template file
+trac: NFC
+turbogears: NFC
+wordpress: Multistage install process
+
+PHILOSOPHY ABOUT LOGGING
+
+Logging is most useful when performing a mass run.  This
+includes things such as mass-migration as well as when running
+summary reports.  An interesting property about mass-migration
+or mass-upgrade, however, is that if they fail, they are
+idempotent, so an individual case can be debugged simply running
+the single-install equivalent with --debug on.  (This, indeed,
+may be easier to do than sifting through a logfile).
+
+It is a different story when you are running a summary report:
+you are primarily bound by your AFS cache and how quickly you can
+iterate through all of the autoinstalls.  Checking if a file
+exists on a cold AFS cache may
+take several minutes to perform; on a hot cache the same report
+may take a mere 3 seconds.  When you get to more computationally
+expensive calculations, however, even having a hot AFS cache
+is not enough to cut down your runtime.
+
+There are certain calculations that someone may want to be
+able to perform on manipulated data.  As such, this data should
+be cached on disk, if the process for extracting this data takes
+a long time.  Also, for usability sake, Wizard should generate
+the common case reports.
+
+Ensuring that machine parseable reports are made, and then making
+the machinery to reframe this data, increases complexity.  Therefore,
+the recommendation is to assume that if you need to run iteratively,
+you'll have a hot AFS cache at your fingerprints, and if that's not
+fast enough, then cache the data.
+
+COMMIT MESSAGE FIELDS:
+
+Installed-by: username@hostname
+Pre-commit-by: Real Name <username@mit.edu>
+Upgraded-by: Real Name <username@mit.edu>
+Migrated-by: Real Name <username@mit.edu>
+Wizard-revision: abcdef1234567890
+Wizard-args: /wizard/bin/wizard foo bar baz
+
+GIT COMMIT FIELDS:
+
+Committer: Real Name <username@mit.edu>
+Author: lockername locker <lockername@scripts.mit.edu>
  
  NOTES:
  
-- A perfectly formed autoinstall with upgrade paths for all of
-  the intervening versions is not really feasible to implement.
-  As such, we want to migrate everything to -scripts, and then
-  generate a -scripts2 with the correct .scripts directory.
-  We will then nop update some installs, but this will prevent
-  us from having to migrate and update concurrently.
-
-- summary and info are still not using loggers. Maybe they should,
-  maybe they shouldn't
-
-- We should think about stewarding the amount of objects we use
-  by using some arcane Git flags and objects/alternates. Much
-  research is needed.
+- It is not expected or required for update scripts to exist for all
+  intervening versions that were present pre-migration; only for it
+  to work on the most recent migration.
+
+- Currently all repositories are initialized with --shared, which
+  means they have basically ~no space footprint.  However, it
+  also means that /mit/scripts/wizard/srv MUST NOT lose revs after
+  deployment.
+
+- Full fledged logging options. Namely:
+  x all loggers (delay implementing this until we actually have debug stmts)
+    - default is WARNING
+    - debug     => loglevel = DEBUG
+  x stdout logger
+    - default is WARNING (see below for exception)
+    - verbose   => loglevel = INFO
+  x file logger (creates a dir and lots of little logfiles)
+    - default is OFF
+    - log-file   => loglevel = INFO
  
  OVERALL PLAN:
  
@@ -43,31 +231,33 @@ OVERALL PLAN:
    on documenting them.  Specifically, we will be keeping:
  
      - parallel-find.pl, and the resulting
-/mit/scripts/sec-tools/store/scriptslist
+      /mit/scripts/.htaccess/scripts/sec-tools/store/scriptslist
  
-    - The current install scripts will be kept in place, sans changes
-      necessary to make them use Git install of copying the script over.
-      Porting these scripts to Python and making them modular would be
-      nice, but is priority.  For the long term, seeing this scripts
-      be packaged with rest of our code would be optimal.
+* The new procedure for generating an update is as follows:
+  (check out the mass-migration instructions for something in this spirit,
+  although uglier in some ways; A indicates the step /should/ be automated)
  
-* The new procedure for generating an update is as follows (this is
-  also similar to procedure for creating these repositories):
+    0. ssh into not-backward, temporarily give the daemon.scripts-security-upd
+       bits by blanching it on system:scripts-security-upd, and run parallel-find.pl
  
      1. Have the Git repository and working copy for the project on hand.
  
-    2. Checkout the pristine branch
+/- wizard prepare-pristine --
+
+A   2. Checkout the pristine branch
+
+A   3. Remove all files from the working copy.  Use `wipe-working-dir`
  
-    3. Remove all files from the working copy (rm -Rf *, and then delete
-       any dot stragglers.  A script to do this would be handy)
+A   4. Download the new tarball
  
-    4. Download the new tarball
+A   5. Extract the tarball over the working copy (`cp -R a/. b` works well,
+       remember that the working copy is empty; this needs some intelligent
+       input)
  
-    5. Extract the tarball over the working copy (`cp -R a/. b` works well,
-       remember that the working copy is empty)
+A   6. Check for empty directories and add stub files as necessary.
+       Use `preserve-empty-dir`
  
-    6. Check for empty directories and add stub files as necessary
-       (use preserve-empty-dir)
+\---
  
      7. Git add it all, and then commit as a new pristine version (v1.2.3)
  
@@ -79,161 +269,96 @@ OVERALL PLAN:
         with --no-commit (otherwise, you want to git commit --amend
         to keep our history clean
  
-       [FOR THE FIRST TIME]
-       Apply the scripts patch that was used for that version here
-       (usually patch -p1 < patch)
+       [FOR NEW REPOSITORIES]
+       Check if any patches are needed to make the application work
+       on Scripts (ideally, it shouldn't.
  
-   10. Check if there are any special update procedures, and update the
-       .scripts/update shell script as necessary (this means that any
-       application specific update logic will be kept with the actual
-       source code.  The language of this update script will vary
-       depending on context.)
+/- wizard prepare-new --
  
-   11. Commit your changes, and tag as v1.2.3-scripts
+    Currently not used for anything besides parallel-find.pl, but
+    we reserve the right to place files in here in the future.
  
-   If you're setting up a repository from scratch, stop here, and
-   repeat as necessary
+A       mkdir .scripts
+A       echo "Deny from all" > .scripts/.htaccess
  
-       XXX: Should we force people to push to the real repository at
-       this point, or just make the repository that the script pulls
-       stuff out of configurable? (Twiddling origin can get you a
-       devel setup with no code changes)
+\---
  
-   12. Run the "dry-run script", which uses Git commands to check how many
-       working copies apply the change cleanly, and writes out a logfile
-       with the working copies that don't apply cleanly.
+   10. Check if there are any special update procedures, and update
+       the wizard.app.APPNAME module accordingly (or create it, if
+       need be).
  
-   13. Run the "limited run" script, which applies the update to our
-       test-bed, and lets us check the basic functionality of the update.
-       This can include a script that lets us update a single directory
-       with verbose output.
+   11. Run 'wizard prepare-config' on a scripts server while in a checkout
+       of this newest version.  This will prepare a new version of the
+       configuration file based on the application's latest installer.
+       Manually merge back in any custom changes we may have made.
+       Check if any of the regular expressions need tweaking by inspecting
+       the configuration files for user-specific gunk, and modify
+       wizard.app.APPNAME accordingly.
  
-   14. Run the "deploy" script, which applies the update to all working
-       copies possible, and sends mail to users to whom the working copy
-       did not apply cleanly. It also frobs .scripts/version for successful
-       upgrades.
-
-   15. Run parallel-find.pl
-
-* For mass importing into the repository, the steps are:
-
-[TO SET IT UP]
-# let app-1.2.3 be the scripts folder originally in deploydev
-# let this folder be srv/
-# you can also do a git clone
-    mkdir app
-    cd app
-    git init
-    cd ..
-unfurl app-1.2.3 app
-# NOTE: contents of application are now in app directory
-cd app
-git add .
-git commit -s -m "App 1.2.3"
-git tag v1.2.3
-git branch pristine
-# NOTE: you're still on master branch
-# WARNING: the following operation might require -p1
-patch -p0 < ../app-1.2.3/app-1.2.3.patch
-# NOTE: please sanity check the patch!
-git add .
-# NOTE: -a flag is to handle if the patch deleted something
-git commit -as -m "App 1.2.3-scripts"
-git tag v1.2.3-scripts
-
-[TO ADD AN UPDATE]
-# let this folder be srv/app.git
-git checkout pristine
-# NOTE: this preserves your .git folder, but removes everything
-wipe-working-dir .
-cd ..
-unfurl app-1.2.3 app
-cd app
-# NOTE: please sanity check app directory
-git add .
-# NOTE: -a is to take care of deletions
-git commit -as -m "App 1.2.3"
-git tag v1.2.3
-[IF THE PATCH HAS CHANGED]
-    # You are on the pristine branch
-    # NOTE: Now, the tricky part (this is different from a real update)
-    git symbolic-ref HEAD refs/heads/master
-    # NOTE: Now, we think we're on the master branch, but we have
-    # pristine copy checked out
-    # NOTE: -p0 might need to be twiddled
-    patch -p0 < ../app-1.2.3/app-1.2.3.patch
-    git add .
-    # COMMENT: used to git checkout .scripts here
-    # then check if the directory needs an updated update script
-    # NOTE: Fake the merge
-    git rev-parse pristine > .git/MERGE_HEAD
-[IF THE PATCH HASN'T CHANGED]
-    git checkout master
-    git merge --no-commit pristine
-git commit -as -m "App 1.2.3-scripts"
-git tag v1.2.3-scripts
+   12. Commit your changes, and tag as v1.2.3-scripts (or scripts2, if
+       you are amending an install without an upstream changes)
  
+      NOTE: These steps should be run on a scripts server
  
-* The repository for a given application will contain the following files:
+   13. Test the new update procedure using our test scripts.  See integration
+       tests for more information on how to do this.
  
-    - The actual application's files, as from the official tarball
+        http://scripts.mit.edu/wizard/testing.html#acceptance-tests
+
+      GET APPROVAL BEFORE PROCEEDING ANY FURTHER
  
-    - A .scripts directory, which contains the following information:
+      NOTE: The following commands are to be run on not-backward.mit.edu.
+      You'll need to add daemon.scripts-security-upd to
+      scripts-security-upd to get bits to do this.  Make sure you remove
+      these bits when you're done.
  
-        [IF THIS IS THE FIRST UPDATE]
-            mkdir .scripts
-            echo "Deny from all" > .scripts/.htaccess
-            touch .scripts/update
-            chmod a+x .scripts/update
-            # OPERATION: create the update script
+A  14. Run `wizard research appname`
+       which uses Git commands to check how many
+       working copies apply the change cleanly, and writes out a logfile
+       with the working copies that don't apply cleanly.  It also tells
+       us about "corrupt" working copies, i.e. working copies that
+       have over a certain threshold of changes.
  
-        * .scripts/update shell script (with the +x bit set appropriately),
-          which performs the commands necessary to update a script.  This can
-          be in any language.
+A  15. Run `wizard mass-upgrade appname`, which applies the update to all working
+       copies possible, and sends mail to users to whom the working copy
+       did not apply cleanly.
  
-        * .scripts/.htaccess to prevent this directory from being accessed
-          from the web.
+   16. Run parallel-find.pl to update our inventory
  
-        * .scripts/database (generated) contains the database the
-          user installed the script to, so scripts-remove can clean it
+* For mass importing into the repository, there are a few extra things:
  
-            XXX: Could cause problems if a user copies the autoinstall,
-            fiddles with the DB credentials, and then scripts-remove's
-            the autoinstall.  Possible fix is to add the original
-            directory as a sanity check.  Additionally, we could have
-            the application read out of this file.
+    * Many applications had patches associated with them.  Be sure to
+      apply them, so later merges work better.
  
-        * .scripts/version (generated) which contains the version
-          last autoinstalled (as distinct from the actual version
-          the script is) (This is the same as .scripts-version right
-          now; probably want to keep that for now)
+        # the following operation might require -p1
+        patch -p0 < ../app-1.2.3/app-1.2.3.patch  # [FIDDLY BIT]
  
-            XXX: It's unclear if we want to move to this wholesale, or
-            delay this indefinitely.
+    * When running updates, if the patch has changed you will have to
+      do a special procedure for your merge:
  
-* The migration process has been implemented, see 'wizard migrate'.
+        git checkout pristine
+        # NOTE: Now, the tricky part (this is different from a real update)
+        git symbolic-ref HEAD refs/heads/master
+        # NOTE: Now, we think we're on the master branch, but we have
+        # pristine copy checked out
+        # NOTE: -p0 might need to be twiddled
+        patch -p0 < ../app-1.2.3/app-1.2.3.patch
+        git add .
+        # reconstitute .scripts directory
+        git checkout v1.2.2-scripts -- .scripts
+        git add .scripts
+        # NOTE: Fake the merge
+        git rev-parse pristine > .git/MERGE_HEAD
  
-    XXX: We have not decided what migration should do to .scripts-version;
-    if it does move it to .scripts, repositories should have a .gitignore
-    in those directories
+      You could also just try your luck with a manual merge using the patch
+      as your guide.
  
-* The autoupgrade shall be the process of:
+* The repository for a given application will contain the following files:
  
-    # Make the directory not accessible by the outside world (htaccess, but be careful!)
-    git add -u .
-    git commit -m 'automatically generated backup'
-    git pull origin master
-    if [ $? ne 0 ]; then git reset --hard; echo 'conflicts during upgrade'; fi
-    ./.scripts/update
-    # Make it accessible
+    - The actual application's files, as from the official tarball
  
-  (with some more robust error checking)
+    - A .scripts directory, with the intent of holding Scripts specific files
+      if they become necessary.
  
-* All code that operates on an untrusted Git repository, or runs
-  executable code, should be done on NOT-BACKWARD.mit.edu.  Pending
-  accounts confirmation, it will also get a principal
-  daemon.scripts-security-upd, which is what we'll actually put
-  in the scripts-security-upd group.
+        * .scripts/lock (generated) which locks an autoinstall during upgrade
  
-* Make 'wizard summary' generate nice pretty graphs of installs by date
-  (more histograms, will need to check actual .scripts-version files.)