Thu, 19 Nov 2009
Using the SQL database for keeping one's data gives an excellent environment, maintaining the data integrity, providing the transactional behaviour, providing the remote access to the data, and so on. Even the locking properties can be something which one can get used to. That is, in the ideal world.
However, our world is not ideal. The huge problem of SQL databases is their implementation. For example, after rewriting the IS MU mailserver back-end to do a parallel delivery, it started to generate big load spikes on the Oracle DB server. The problem turned out to be the cache of SQL queries: when several processes tried to do exactly the same query in parallel, the DB server locked up on the access to the SQL cache, and a simple "select row by its primary key" query took as long as three minutes to handle.
Another example is the Oracle problem with foreign key locking which I have
recently ran into: I have a long transactions running in parallel,
modifying various rows of a single table (but each session touches
a different set of rows, so the access should be deadlock-free).
After creating another table with foreign key to the original one,
I started to get "deadlock detected" errors in DELETE commands.
Apparently Oracle locks not only the appropriate
row in the foreign-key table, but the whole block in this table.
So I have been getting the deadlocks when trying to delete the row
with primary key N, where another session added a row to the table
with foreign key referencing the primary key N+1 or N-1.
UPDATE ... SET status='deleted'
and deleting afterwards from the single session fixed the problem for me.
The SQL databases are pile of rubbish, which can always surprise you not only with their by-definition properties, but often also with their implementation-dependent behaviour. Oracle is an excellent example of this.
5 replies for this story:
Milan Zamazal wrote:
SQL databases are not inherently bad. It's just that no complex software is without problems. I can't comment on Oracle but e.g. PostgreSQL, despite its own problems (the most notable one being that SELECT COUNT(*) requires sequential scan), can be a reliable and very useful tool for some purposes. When using any complex software one sometimes falls into a rage but frankly how about bugs and design flaws in our own software? Nothing is perfect and still not everything is a pile of rubbish.
I have probably been misunderstood: the article was not about every software being crap, but about SQL databases specifically containing _lots_ of implementation-dependent behaviour (especially in locking), which is impossible to predict when designing an application, especially a high-performance one (oh look, this idiot has said "SQL" and "high-performance" in one sentence ;-).
Milan Zamazal wrote:
Do you mean Oracle locking behavior is not documented in its manuals? If so (really?), you shouldn't generalize your experience to all SQL engines without reason.
Yenya wrote: Re: Oracle
The point is that it is not a general property of SQL locking, but a (default-only?) property of the Oracle implementation. Which is impossible to anticipate until you actually try to use it and run into the problem. (As for PG, the missing packages and missing subtransactions/checkpoints, as well as the count(*) problem you describe are sufficient examples).
Milan Zamazal wrote:
You can't expect to find an SQL engine without implementation limitations, for clear reasons. I don't know what you mean by missing packages etc. in PostgreSQL, but AFAIK PostgreSQL manual is pretty complete, including documentation of locking behavior, subtransactions, checkpoints, extension packages and the COUNT(*) problem. So you can anticipate it all, just read the manual, especially if you intend to base a performance critical application on the engine. If you don't read the manual then complaining about unanticipated behavior and labeling SQL engines as piles of rubbish for this reason is demonstration of a completely different problem...