Back to Top

Tuesday, April 08, 2008

Circumventing the need for transactions in MySQL

While reading the excellent series on "Web 2.0" and databases on the O'Reilly radar blog it occurred to me that there is a nice trick with MySQL for making it semi-transactional (as a side-note: these days I have work with MySQL less and less and am fully enjoying the goodness that is PostgreSQL and pgAdmin).

Lets say that you have the following situation:

  • MySQL with MyISAM tables
  • A process which does a SELECT and depending on the result (for example if a given field has a certain value) issues an UPDATE

It is quite obvious that this method is not "thread safe", meaning that if you have multiple clients operating on the same records, you can very easily get into the following situation:

  • Client A does the SELECT and decides that it needs to update
  • Client B does the SELECT and it to decides to update
  • Client A does the update
  • Client B does the update

As the number of clients grows, the possibility of this situation appearing tends very fast to 100%. Your options to eliminate these situations are the following (again, assuming MyISAM tables with no transaction support):

Method 1

Simulate transactions by locking the table - this can reduce the system to a crawl, since it effectively serializes all updates. The problems gets worse and worse as the delay between the SELECT and UPDATE increases (if you have to perform complex calculations to decide if an update is needed for example).

Method 2

After doing the update, do an additional select to make sure that we were the last to update the field.

A slightly more elegant solution would be the following: when doing the UPDATE, put in the WHERE part the expected value for the fields which may change. Probably a little example is in order to make this clear. Lets suppose that we have the following table:

column_a column_b
1 aaa
... ...

And we want to do something like this:

SELECT * FROM table WHERE column_a = 1
...check if column_b equal "aaa"...
UPDATE table SET column_b = "bbb" WHERE column_a = 1

To avoid the race condition, modify the last update as follows:

UPDATE table SET column_b = "bbb" WHERE column_a = 1 AND column_b = "aaa"

Now we don't need to perform an additional select, we can directly check the "number of affected columns" (which is returned in most - if not all - client libraries) and if it's one, we succeeded, otherwise someone else "stole our thunder".

Method 3

The one I actually wanted to talk about: express the whole procedure as a single SQL statement. Following the previous example, we could again write:

UPDATE table SET column_b = "bbb" WHERE column_a = 1 AND column_b = "aaa"

The difference compared to the previous method is that we don't need the SELECT because we included the verification step in the WHERE clause. Again, we check the number of affected rows to find out if we succeeded. The basis of this method is that although MySQL doesn't guarantee the serializability of multiple queries (on MyISAM tables unless you lock the table), it does guarantee the serializability of individual queries. In fact it has to because otherwise simple queries like UPDATE table SET a = a + 1 could not be guaranteed to produce a correct results in all circumstance. So, as long as you can express the operations which are prone to produce incorrect results when multiple copies are executed, you are fine. There is almost no limit to the conditions you can express in SQL. If your expression becomes too complicated or requires complicated controls structures (branches, loops, etc), you can hide it away in a stored procedure since version 5.0. However you should not access the database from these stored procedures, because this would break the "query serialization" process (queries from a stored procedure, unless explicitly part of a transaction, is subject to synchronization problems!). It should strictly operate on its input parameters.

1 comment:

  1. Anonymous4:13 PM

    method 2 is great for me!!! thanks dude you rule.

    ReplyDelete