Opened 14 years ago

Last modified 14 years ago

#150 new enhancement

load-balance SQL reads where possible

Reported by: geofft Owned by:
Priority: normal Milestone:
Component: sql Keywords:
Cc:

Description

Applications that are read-only (for instance, most blogs and many wikis and Tracs before login) don't need to make those reads from the primary, and it will probably greatly improve perceived performance/responsiveness to direct some or all of those requests at the secondary based on the servers' relative load. This is a tricky problem, though, and largely reduces to the "you can't properly DBA sql.mit.edu" problem: we run user-controlled webapps, so we can't optimize queries or client behavior, and all cleverness needs to be server-side.

In particular, the first problem is that we can't tell if a connection is going to be reads or writes, so we'll need to add a new name/IP (e.g., "sql-ro") for users to opt-in connections that they know will only be issuing read queries.

This means that for this to be truly productive, we'll need to do a lot of that opting in in autoinstallers, for instance, patching WordPress? and MediaWiki? to hit sql-ro instead of sql for non-logged-in browsing of pages. We'll especially need to do so in a way that can be backported to existing apps as part of our autoupgrade infrastructure, but without running the risk of user plugins that do need to write to SQL on each page access getting into trouble. Note that even without plugins, anonymous users can generally at least request account creation, and comment on WordPress? blogs or use the state-modifying MediaWiki? "random page" functionality.

Note that you'll need to be a little clever about handling those cases as well as cases of users logging in. Because the secondary can lag behind the primary in terms of query updates, once you switch from sql-ro to sql for a user, you generally need to stick with sql for their session, defining "session" in some useful way for non-logged-in users (e.g., 10 minutes of inactivity, which is what LVS uses). This is trivially true for logins and certainly true for comments and anonymous edits, since users want to see those, but you could likely get away not "condemning" the IP of a user who visits Special:Randompage to not using sql-ro until their session times out.

Finally, we'll need to be all sorts of paranoid about the fact that we've been doing query-based replication instead of row-based replication and see if the major auto-installed apps have user-visible nondeterministic queries. Updating random page scores is nondeterministic, but it should also, I hope, not be user-visible in a way we care about. But there may be other cases where, say, a random subset of rows was removed, and we need to be careful about those.

Change History (2)

comment:1 in reply to: ↑ description Changed 14 years ago by geofft

Replying to geofft:

Finally, we'll need to be all sorts of paranoid about the fact that we've been doing query-based replication instead of row-based replication and see if the major auto-installed apps have user-visible nondeterministic queries. Updating random page scores is nondeterministic

So apparently I was wrong and RAND() is in fact statement-based-replicated correctly. It's listed explicitly as one of the things you might think that wouldn't hold true for on the MySQL docs about SBR. There are still nondeterministic statements, like "DELETE FROM table LIMIT 3", but those are much odder and I'm much more okay with those breaking, since sql-ro would be opt in anyway and presumably you wouldn't opt in if you knew you were doing weird stuff like that.

comment:2 in reply to: ↑ description Changed 14 years ago by presbrey

Replying to geofft:

In particular, the first problem is that we can't tell if a connection is going to be reads or writes, so we'll need to add a new name/IP (e.g., "sql-ro") for users to opt-in connections that they know will only be issuing read queries.

This means that for this to be truly productive, we'll need to do a lot

MySQL proxy might be a good alternative.

https://launchpad.net/mysql-proxy http://forge.mysql.com/wiki/MySQL_Proxy_RW_Splitting

I looked at the known problem list a while back. I'm pretty sure a LUA script could parse straight-SELECTs (excepting the known issues) to the secondary.

Note: See TracTickets for help on using tickets.