Opened 14 years ago

Last modified 13 years ago

#138 new enhancement

autoinstallers shouldn't fail if primary is down

Reported by: geofft Owned by:
Priority: normal Milestone:
Component: autoinstallers Keywords:


For a couple of reasons including being able to get to the server where your cronjob runs, ssh scripts always reaches a single machine (the "primary"), which is also where user-run services on high ports run. However, none of the autoinstallers that ssh to scripts particularly care about getting to this single server; they're just setting up a webapp that you're going to be accessing via the load-balanced HTTP protocol anyway. This causes an unneeded single point of failure in the autoinstall process. Same with the signup process -- any server can add you and your vhost to LDAP. The intent of multimaster replication is, again, to avoid a single point of failure.

It would be cool if by some mechanism the autoinstallers were robust to the primary failing and not accepting SSH connections, and they could simply log in to any running scripts server to complete the install. The most obvious solution that occurs to me is to add another SSH port that's in the same load-balancing pool as HTTP(S), but you could also do evil things with client-side overcleverness, etc.

Change History (5)

comment:1 Changed 14 years ago by andersk

Can we just deploy hacron and then have LVS automatically select the primary?

comment:2 Changed 14 years ago by mitchb

Well, yes, that is the idea, ultimately. But there are a couple things other than the code review that need to be figured out first. Most notable of these is how STONITHing is going to work.

comment:3 Changed 14 years ago by adehnert

Is there a problem with adding another ssh port to load-balancing? Is that just a matter of adding a firewall rule to the directors to set FWM2 for another port? We could also potentially get another hostname for load-balanced ssh? I don't know that that's better at all.

comment:4 Changed 14 years ago by geofft

Hm. I do kind of like the option of adding another ssh port and load-balancing it.

comment:5 Changed 13 years ago by adehnert

There's some dispute over whether fixing this is desirable. Several people argue that failing early and obviously is better than working, since "ssh scripts" won't work and that could be confusing. (Note that a moving primary doesn't really "fix" this, per se, it just makes it invalid...)

Note: See TracTickets for help on using tickets.